论文标题
COVID-19和SARS-COV-2的综合词典和术语变化分析
A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2
论文作者
论文摘要
用于指代SARS-COV-2或COVID-19的科学文献中的独特术语数量非常大,尽管有良好的标准化术语,但仍在迅速增加。这种高度的术语变化使得对这些重要实体的高回忆识别很困难。在本手稿中,我们介绍了文献中使用的广泛词典,用于参考SARS-COV-2和COVID-19。我们使用基于规则的方法来迭代生成新的术语变体,然后将这些变体定位在大型文本语料库中。我们将词典与广泛的术语资源集合进行了比较,证明我们的资源提供了大量其他术语。我们使用词典来分析随着时间的流逝的SARS-COV-2和COVID术语的使用,并表明独特术语的数量持续迅速增长。我们的字典可在https://github.com/ncbi-nlp/covidtermvar上免费获得。
The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature to refer to SARS-CoV-2 and COVID-19. We use a rule-based approach to iteratively generate new term variants, then locate these variants in a large text corpus. We compare our dictionary to an extensive collection of terminological resources, demonstrating that our resource provides a substantial number of additional terms. We use our dictionary to analyze the usage of SARS-CoV-2 and COVID-19 terms over time and show that the number of unique terms continues to grow rapidly. Our dictionary is freely available at https://github.com/ncbi-nlp/CovidTermVar.