薄雾：英语科学文本中模态动词功能的大规模注释资源和神经模型

论文标题

薄雾：英语科学文本中模态动词功能的大规模注释资源和神经模型

MIST: a Large-Scale Annotated Resource and Neural Models for Functions of Modal Verbs in English Scientific Text

论文作者

Henning, Sophie, Macher, Nicole, Grünewald, Stefan, Friedrich, Annemarie

论文摘要

模态动词（例如“ can”，“应该”或“必须”）在科学文章中经常出现。解码其功能并不直接：它们通常用于对冲，但它们也可能表示能力和限制。了解它们的含义对于各种NLP任务（例如编写协助或从科学文本中提取准确的信息提取）很重要。为了促进对这种类型中模态使用的研究，我们介绍了雾（科学文本中的模态）数据集，该数据集包含3737个模态实例，其中包括五个科学领域，用于其语义，务实或修辞功能。我们系统地评估了一组在薄雾上的竞争性神经体系结构。转移实验表明，利用非科学数据对建模雾的区别的好处有限。我们的语料库分析提供了证据，表明科学群落在模态动词的使用情况下有所不同，但是，经过一定程度地对科学数据进行培训的分类器在某种程度上培训了科学领域。

Modal verbs (e.g., "can", "should", or "must") occur highly frequently in scientific articles. Decoding their function is not straightforward: they are often used for hedging, but they may also denote abilities and restrictions. Understanding their meaning is important for various NLP tasks such as writing assistance or accurate information extraction from scientific text. To foster research on the usage of modals in this genre, we introduce the MIST (Modals In Scientific Text) dataset, which contains 3737 modal instances in five scientific domains annotated for their semantic, pragmatic, or rhetorical function. We systematically evaluate a set of competitive neural architectures on MIST. Transfer experiments reveal that leveraging non-scientific data is of limited benefit for modeling the distinctions in MIST. Our corpus analysis provides evidence that scientific communities differ in their usage of modal verbs, yet, classifiers trained on scientific data generalize to some extent to unseen scientific domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题