H索引的不一致：数学分析

论文标题

H索引的不一致：数学分析

The inconsistency of h-index: a mathematical analysis

论文作者

Brito, Ricardo, Navarro, Alonso Rodríguez

论文摘要

引文分布是对数正态。我们使用30个对数正态分布的合成系列数字，这些数字模拟真实的引用来研究H索引的一致性。使用对数正态累积分布函数，可以制定定义H索引的方程式；该方程表明H对论文数（n）的数量具有复杂的依赖性。我们还研究了H与超过各种引文阈值的论文数量之间的相关性，从500到500次引用。最好的相关性是100个阈值，但许多数据点与一般趋势不同。尺寸无关的指标h/n与发表超过任何引文阈值的论文的概率没有相关性。与H指数相反，引用的总数显示出与超过10和50引用阈值的论文数量的高相关性；引用的平均数量与发表超过任何级别引用的论文的可能性相关。因此，在合成系列中，引用的数量和平均引用数量比H和H/N更好地指标。我们讨论在实际的引文分布中还有其他困难。

Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation is for the 100 threshold but numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations shows a high correlation with the number of papers exceeding the thresholds of 10 and 50 citations; the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that in real citation distributions there are other difficulties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题