论文标题

软件工程论文的一般索引

The General Index of Software Engineering Papers

论文作者

Khalil, Zeinab Abou, Zacchiroli, Stefano

论文摘要

我们介绍了软件工程论文的一般索引,这是来自软件工程领域最杰出的科学场所的全文索引论文的数据集。该数据集包括完整的书目信息和索引的ngram(删除停止词和非词后连续单词的顺序,总计577 276 276 276 382在此版本中长度为1至5的唯一n-gram),其中1至5 for 44 581纸从1971-20个期间中从34个场所中取回的34个场所。即使无法访问论文或学术搜索引擎,也无法进行工程研究(例如,由于合同原因)。该数据集还有助于进行此类分析可再现和独立验证,而不是使用第三方和非打开的学术索引服务进行的情况。

We introduce the General Index of Software Engineering Papers, a dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. The dataset includes both complete bibliographic information and indexed ngrams (sequence of contiguous words after removal of stopwords and non-words, for a total of 577 276 382 unique n-grams in this release) with length 1 to 5 for 44 581 papers retrieved from 34 venues over the 1971-2020 period.The dataset serves use cases in the field of meta-research, allowing to introspect the output of software engineering research even when access to papers or scholarly search engines is not possible (e.g., due to contractual reasons). The dataset also contributes to making such analyses reproducible and independently verifiable, as opposed to what happens when they are conducted using 3rd-party and non-open scholarly indexing services.The dataset is available as a portable Postgres database dump and released as open data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源