语义改变修改以评估机器阅读中的理解

论文标题

语义改变修改以评估机器阅读中的理解

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

论文作者

Schlegel, Viktor, Nenadic, Goran, Batista-Navarro, Riza

论文摘要

NLP的进步为机器阅读理解（MRC）的任务带来了令人印象深刻的结果，据报道，方法可以实现与人类相当的绩效。在本文中，我们研究了最新的MRC模型是否能够正确处理改变修饰的语义（SAM）：语言动机的现象，这些现象改变了句子的语义，同时保留了其大多数词汇表面形式。我们提出了一种自动生成和对齐挑战集的方法，该挑战集具有原始示例和更改的示例。我们进一步提出了一种新颖的评估方法，以正确评估MRC系统处理这些示例的能力，而不是通过域移动引入的效果来折现其优化的数据。在一项大规模的实证研究中，我们应用了该方法，以评估提取性MRC模型正确处理SAM富集数据的能力。我们全面涵盖了12种不同的最新神经体系结构配置和四个培训数据集，并发现尽管它们出色地表现出色，但优化的模型仍在始终努力正确地处理有关语义更改的数据。

Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether state-of-the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence while preserving most of its lexical surface form. We present a method to automatically generate and align challenge sets featuring original and altered examples. We further propose a novel evaluation methodology to correctly assess the capability of MRC systems to process these examples independent of the data they were optimised on, by discounting for effects introduced by domain shift. In a large-scale empirical study, we apply the methodology in order to evaluate extractive MRC models with regard to their capability to correctly process SAM-enriched data. We comprehensively cover 12 different state-of-the-art neural architecture configurations and four training datasets and find that -- despite their well-known remarkable performance -- optimised models consistently struggle to correctly process semantically altered data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题