通过文献分析浏览Covid-19研究的景观：鸟类的视野

论文标题

通过文献分析浏览Covid-19研究的景观：鸟类的视野

Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view

论文作者

Yeganova, Lana, Islamaj, Rezarta, Chen, Qingyu, Leaman, Robert, Allot, Alexis, Wei, Chin-Hsuan, Comeau, Donald C., Kim, Won, Peng, Yifan, Wilbur, W. John, Lu, Zhiyong

论文摘要

在与持续的Covid-19大流行的战斗中，及时访问准确的科学文献至关重要。这种前所未有的公共卫生风险促使研究一般理解该疾病，确定治疗该疾病，开发潜在的疫苗等药物。这导致了截至2020年5月20天每20天的迅速发展的文献，这些文献数量增加了一倍。为医疗专业人员提供了快速分析知识的手段，以解决知识的成长领域，以解决知识的成长领域，以解决他们的问题和信息和信息和信息和信息和信息和信息。在这项研究中，我们分析了截至2020年5月15日在PubMed中发现的13,369 Covid-19的相关文章，目的是研究文献的景观，并以促进信息导航和理解的格式介绍它。我们通过应用最新的命名实体识别，分类，聚类和其他NLP技术来做到这一点。通过应用NER工具，我们捕获了相关的生物特性（例如疾病，内部器官等），并通过在语料库中讨论的程度评估其与Covid-19的关系的强度。我们还收集了有关Covid-19的各种症状和合并症。我们的聚类算法标识了由相关术语组表示的主题，并计算与与主题术语相关的文档相对应的群集。在主题中，我们观察到了几个星期的持续时间，并且有许多相关的文件，还有一些出现在新兴的主题中，文档较少。所有工具和数据均可公开使用，并且该框架可以应用于任何文献收集。综上所述，这些分析产生了Covid-19研究的全面，合成的观点，以促进文献发现知识。

Timely access to accurate scientific literature in the battle with the ongoing COVID-19 pandemic is critical. This unprecedented public health risk has motivated research towards understanding the disease in general, identifying drugs to treat the disease, developing potential vaccines, etc. This has given rise to a rapidly growing body of literature that doubles in number of publications every 20 days as of May 2020. Providing medical professionals with means to quickly analyze the literature and discover growing areas of knowledge is necessary for addressing their question and information needs. In this study we analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020 with the purpose of examining the landscape of literature and presenting it in a format that facilitates information navigation and understanding. We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques. By applying NER tools, we capture relevant bioentities (such as diseases, internal body organs, etc.) and assess the strength of their relationship with COVID-19 by the extent they are discussed in the corpus. We also collect a variety of symptoms and co-morbidities discussed in reference to COVID-19. Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms. Among the topics we observe several that persist through the duration of multiple weeks and have numerous associated documents, as well several that appear as emerging topics with fewer documents. All the tools and data are publicly available, and this framework can be applied to any literature collection. Taken together, these analyses produce a comprehensive, synthesized view of COVID-19 research to facilitate knowledge discovery from literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题