论文标题
D3:用于分析计算机科学研究状态的学术元数据的大量数据集
D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research
论文作者
论文摘要
DBLP是计算机科学科学文章的最大开放访问存储库,并提供了与出版物,作者和场所相关的元数据。我们从DBLP中检索了超过600万个出版物,并从出版物文本中提取了相关的元数据(例如摘要,作者分支机构,引用),以创建DBLP Discovery Dataset(D3)。 D3可用于确定计算机科学研究的研究活动,生产力,重点,偏见,可及性和影响的趋势。我们提出了针对计算机科学研究量的初步分析(例如论文,作者,研究活动的数量),感兴趣的主题趋势和引文模式。我们的发现表明,计算机科学是一个不断增长的研究领域(每年约15%),并具有一个积极的协作研究员社区。与前几十年相比,近年来的论文提供了更多的书目条目,但引用的平均数量一直在下降。调查论文的摘要表明,最近的主题趋势在D3中明显反映。最后,我们列出了D3和提出补充研究问题的进一步应用。 D3数据集,我们的发现和源代码可公开用于研究目的。
DBLP is the largest open-access repository of scientific articles on computer science and provides metadata associated with publications, authors, and venues. We retrieved more than 6 million publications from DBLP and extracted pertinent metadata (e.g., abstracts, author affiliations, citations) from the publication texts to create the DBLP Discovery Dataset (D3). D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research. We present an initial analysis focused on the volume of computer science research (e.g., number of papers, authors, research activity), trends in topics of interest, and citation patterns. Our findings show that computer science is a growing research field (approx. 15% annually), with an active and collaborative researcher community. While papers in recent years present more bibliographical entries in comparison to previous decades, the average number of citations has been declining. Investigating papers' abstracts reveals that recent topic trends are clearly reflected in D3. Finally, we list further applications of D3 and pose supplemental research questions. The D3 dataset, our findings, and source code are publicly available for research purposes.