论文标题

零射和几乎没有生物医学文章的分类,

Zero-Shot and Few-Shot Classification of Biomedical Articles in Context of the COVID-19 Pandemic

论文作者

Lupart, Simon, Favre, Benoit, Nikoulina, Vassilina, Ait-Mokhtar, Salah

论文摘要

网格(医学主题标题)是由国家医学图书馆创建的大型词库,用于生物医学领域中出版物的细粒度索引。在COVID-19大流行的背景下,网格描述符已与有关相应主题发表的文章有关。零射击分类是及时标记具有网格类别的论文流的适当响应。在这项工作中,我们假设网格中可用的丰富语义信息有可能改善生物Biobert表示形式,并使它们更适合于零射击/少量射击任务。我们将问题构架为确定网格术语定义(与纸张摘要的连接是有效的实例),并利用多任务学习以通过SEQ2SEQ任务在表示中诱导网格层次结构。结果在MEDLINE和LITCOVID数据集上建立了一个基线,并探测结果表明,结果表示传达了网格中存在的层次关系。

MeSH (Medical Subject Headings) is a large thesaurus created by the National Library of Medicine and used for fine-grained indexing of publications in the biomedical domain. In the context of the COVID-19 pandemic, MeSH descriptors have emerged in relation to articles published on the corresponding topic. Zero-shot classification is an adequate response for timely labeling of the stream of papers with MeSH categories. In this work, we hypothesise that rich semantic information available in MeSH has potential to improve BioBERT representations and make them more suitable for zero-shot/few-shot tasks. We frame the problem as determining if MeSH term definitions, concatenated with paper abstracts are valid instances or not, and leverage multi-task learning to induce the MeSH hierarchy in the representations thanks to a seq2seq task. Results establish a baseline on the MedLine and LitCovid datasets, and probing shows that the resulting representations convey the hierarchical relations present in MeSH.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源