论文标题

使用文章引用网络将信用额分配给科学数据集

Assigning credit to scientific datasets using article citation networks

论文作者

Zeng, Tong, Wu, Longfeng, Bratt, Sarah, Acuna, Daniel E.

论文摘要

引用是连接科学工件的完善机制。引用网络是出于各种原因的引用分析使用,以赞扬科学家的工作。但是,由于当前的引用实践,科学家倾向于仅引用出版物,而遗漏了其他类型的文物,例如数据集。即使数据集越来越重新使用和尝试,数据集也无法获得适当的信用。我们开发了一个称为Datarank的网络流量测量,旨在解决此差距。 Datarank根据引用如何流过图,区分出版物和数据集流量,为网络中的每个节点分配一个相对值。我们通过估计其在预测真实数据集使用方面的准确性来评估数据库的质量:网络访问GenBank和Figshare数据集的下载。我们表明,与替代方案相比,DataRank更好地预测了这种用法,同时提供了其他可解释的结果。我们讨论了对引文行为和算法的改进,以正确跟踪并将信用分配给数据集。

A citation is a well-established mechanism for connecting scientific artifacts. Citation networks are used by citation analysis for a variety of reasons, prominently to give credit to scientists' work. However, because of current citation practices, scientists tend to cite only publications, leaving out other types of artifacts such as datasets. Datasets then do not get appropriate credit even though they are increasingly reused and experimented with. We develop a network flow measure, called DataRank, aimed at solving this gap. DataRank assigns a relative value to each node in the network based on how citations flow through the graph, differentiating publication and dataset flow rates. We evaluate the quality of DataRank by estimating its accuracy at predicting the usage of real datasets: web visits to GenBank and downloads of Figshare datasets. We show that DataRank is better at predicting this usage compared to alternatives while offering additional interpretable outcomes. We discuss improvements to citation behavior and algorithms to properly track and assign credit to datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源