论文标题
CORD-19:COVID-19开放研究数据集
CORD-19: The COVID-19 Open Research Dataset
论文作者
论文摘要
COVID-19开放研究数据集(CORD-19)是Covid-19及相关历史冠状病毒研究的科学论文的日益增长的资源。 CORD-19旨在促进在其丰富的元数据和结构化全文论文中收集的文本挖掘和信息检索系统的开发。自发布以来,Cord-19已下载了200万次,并一直是许多Covid-19文本挖掘和发现系统的基础。在本文中,我们描述了数据集构建的机制,突出了挑战和关键的设计决策,概述了如何使用绳索19,并描述了围绕数据集构建的几个共享任务。我们希望该资源将继续汇集计算社区,生物医学专家和政策制定者,以寻求Covid-19的有效治疗和管理政策。
The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.