论文标题

通过检索回答长形的问题中的建模示例

Modeling Exemplification in Long-form Question Answering via Retrieval

论文作者

Wang, Shufan, Xu, Fangyuan, Thompson, Laure, Choi, Eunsol, Iyyer, Mohit

论文摘要

例证是一个过程,作者通过提供示例来解释或阐明概念。尽管在各种形式的写作中常见,但示例在长形式答案(LFQA)的任务中特别有用,在长篇小说回答(LFQA)中,可以通过简单的示例使复杂的答案更容易理解。在本文中,我们提供了QA中的示例性的首次计算研究,对三个语料库中不同类型的示例(例如,假设,轶事)进行了细粒度注释。我们表明,最先进的LFQA模型不仅努力生成相关示例,而且标准评估指标(例如Rouge)不足以判断示例质量。我们建议将示例视为一个\ emph {reterieval}问题,其中部分编写的答案用于查询从语料库中提取的大量人写的示例。我们的方法允许可靠的排名自动指标与人类评估良好相关。人类评估表明,与最先进的LFQA模型产生的示例相比,我们的模型检索到的示例更重要。

Exemplification is a process by which writers explain or clarify a concept by providing an example. While common in all forms of writing, exemplification is particularly useful in the task of long-form question answering (LFQA), where a complicated answer can be made more understandable through simple examples. In this paper, we provide the first computational study of exemplification in QA, performing a fine-grained annotation of different types of examples (e.g., hypotheticals, anecdotes) in three corpora. We show that not only do state-of-the-art LFQA models struggle to generate relevant examples, but also that standard evaluation metrics such as ROUGE are insufficient to judge exemplification quality. We propose to treat exemplification as a \emph{retrieval} problem in which a partially-written answer is used to query a large set of human-written examples extracted from a corpus. Our approach allows a reliable ranking-type automatic metrics that correlates well with human evaluation. A human evaluation shows that our model's retrieved examples are more relevant than examples generated from a state-of-the-art LFQA model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源