论文标题
H索引的不一致:数学分析
The inconsistency of h-index: a mathematical analysis
论文作者
论文摘要
引文分布是对数正态。我们使用30个对数正态分布的合成系列数字,这些数字模拟真实的引用来研究H索引的一致性。使用对数正态累积分布函数,可以制定定义H索引的方程式;该方程表明H对论文数(n)的数量具有复杂的依赖性。我们还研究了H与超过各种引文阈值的论文数量之间的相关性,从500到500次引用。最好的相关性是100个阈值,但许多数据点与一般趋势不同。尺寸无关的指标h/n与发表超过任何引文阈值的论文的概率没有相关性。与H指数相反,引用的总数显示出与超过10和50引用阈值的论文数量的高相关性;引用的平均数量与发表超过任何级别引用的论文的可能性相关。因此,在合成系列中,引用的数量和平均引用数量比H和H/N更好地指标。我们讨论在实际的引文分布中还有其他困难。
Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation is for the 100 threshold but numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations shows a high correlation with the number of papers exceeding the thresholds of 10 and 50 citations; the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that in real citation distributions there are other difficulties.