论文标题
确定科学文章普及的关键因素
Determining crucial factors for the popularity of scientific articles
论文作者
论文摘要
我们使用一组来自PLOS的超过70.000个记录,其中包括37个词汇,情感和书目变量,我们使用机器学习方法支持分析,以预测由它们所观察到的次数定义的科学论文的普及。我们的研究表明了特征之间的相关性,并恢复了视图数量的阈值,从而在Matthew的相关系数方面导致了最佳预测结果。此外,通过为随机森林分类器创建可变的重要性图,我们能够减少功能的数量,同时保持相似的可预测性并确定对流行的关键因素。
Using a set of over 70.000 records from PLOS One journal consisting of 37 lexical, sentiment and bibliographic variables we perform analysis backed with machine learning methods to predict the class of popularity of scientific papers defined by the number of times they have been viewed. Our study shows correlations among the features and recovers a threshold for the number of views that results in the best prediction results in terms of Matthew's correlation coefficient. Moreover, by creating a variable importance plot for random forest classifier, we are able to reduce the number of features while keeping similar predictability and determine crucial factors responsible for the popularity.