通过检索回答长形的问题中的建模示例

论文标题

通过检索回答长形的问题中的建模示例

Modeling Exemplification in Long-form Question Answering via Retrieval

论文作者

Wang, Shufan, Xu, Fangyuan, Thompson, Laure, Choi, Eunsol, Iyyer, Mohit

论文摘要

例证是一个过程，作者通过提供示例来解释或阐明概念。尽管在各种形式的写作中常见，但示例在长形式答案（LFQA）的任务中特别有用，在长篇小说回答（LFQA）中，可以通过简单的示例使复杂的答案更容易理解。在本文中，我们提供了QA中的示例性的首次计算研究，对三个语料库中不同类型的示例（例如，假设，轶事）进行了细粒度注释。我们表明，最先进的LFQA模型不仅努力生成相关示例，而且标准评估指标（例如Rouge）不足以判断示例质量。我们建议将示例视为一个\ emph {reterieval}问题，其中部分编写的答案用于查询从语料库中提取的大量人写的示例。我们的方法允许可靠的排名自动指标与人类评估良好相关。人类评估表明，与最先进的LFQA模型产生的示例相比，我们的模型检索到的示例更重要。

Exemplification is a process by which writers explain or clarify a concept by providing an example. While common in all forms of writing, exemplification is particularly useful in the task of long-form question answering (LFQA), where a complicated answer can be made more understandable through simple examples. In this paper, we provide the first computational study of exemplification in QA, performing a fine-grained annotation of different types of examples (e.g., hypotheticals, anecdotes) in three corpora. We show that not only do state-of-the-art LFQA models struggle to generate relevant examples, but also that standard evaluation metrics such as ROUGE are insufficient to judge exemplification quality. We propose to treat exemplification as a \emph{retrieval} problem in which a partially-written answer is used to query a large set of human-written examples extracted from a corpus. Our approach allows a reliable ranking-type automatic metrics that correlates well with human evaluation. A human evaluation shows that our model's retrieved examples are more relevant than examples generated from a state-of-the-art LFQA model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题