论文标题
长笛:通过文本解释来理解象征性的语言
FLUTE: Figurative Language Understanding through Textual Explanations
论文作者
论文摘要
象征性的语言理解最近被构建为识别文本需要(RTE)任务(又称自然语言推论或NLI)。但是,类似于经典的RTE/NLI数据集,当前的基准测试遭受虚假相关性和注释工件的影响。为了解决这个问题,NLI的工作已经建立了基于解释的数据集,例如E-SNLI,使我们能够探究语言模型是否适合正确的原因。为了解决这个问题,我们发布了长笛,该数据集由9,000个具有说明的象征性NLI实例,涵盖了四个类别:讽刺,明喻,隐喻和成语。我们通过基于GPT-3,人群工人和专家注释者的模型框架框架收集数据。我们展示了如何利用GPT-3与人类注释者(新手和专家)结合使用,即使对于像象征性语言这样的复杂语言现象,也可以缩减数据集的创建。在长笛上微调的T5模型的基线性能表明,我们的数据集可以使我们更接近开发通过文本解释来理解象征性语言的模型。
Figurative language understanding has been recently framed as a recognizing textual entailment (RTE) task (a.k.a. natural language inference, or NLI). However, similar to classical RTE/NLI datasets, the current benchmarks suffer from spurious correlations and annotation artifacts. To tackle this problem, work on NLI has built explanation-based datasets such as e-SNLI, allowing us to probe whether language models are right for the right reasons.Yet no such data exists for figurative language, making it harder to assess genuine understanding of such expressions. To address this issue, we release FLUTE, a dataset of 9,000 figurative NLI instances with explanations, spanning four categories: Sarcasm, Simile, Metaphor, and Idioms. We collect the data through a model-in-the-loop framework based on GPT-3, crowd workers, and expert annotators. We show how utilizing GPT-3 in conjunction with human annotators (novices and experts) can aid in scaling up the creation of datasets even for such complex linguistic phenomena as figurative language. The baseline performance of the T5 model fine-tuned on FLUTE shows that our dataset can bring us a step closer to developing models that understand figurative language through textual explanations.