论文标题

ASQA:FACTOID问题符合长形答案

ASQA: Factoid Questions Meet Long-Form Answers

论文作者

Stelmakh, Ivan, Luan, Yi, Dhingra, Bhuwan, Chang, Ming-Wei

论文摘要

大量数据集和可靠评估指标的可用性导致了FACTOID问题回答(QA)的强劲进步。但是,这一进展并不容易转移到长格式质量检查的任务中,目标是回答需要深入解释的问题。这些障碍包括(i)缺乏高质量的数据,以及(ii)缺乏答案质量的明确定义的概念。在这项工作中,我们通过(i)释放一个新颖的数据集以及我们称为ASQA的任务来解决这些问题(答案摘要是模棱两可的); (ii)提出一个可靠的度量标准,用于测量ASQA的性能。我们的任务集中在模棱两可的事实问题上,也就是说,根据解释,有不同的正确答案。对歧义问题的答案应将来自多个来源的事实信息综合为解决歧义的长期摘要。与现有的长格式质量检查任务(例如ELI5)相反,ASQA承认了一个明确的正确性概念:面对良好摘要的用户应该能够回答对原始歧义问题的不同解释。我们使用这种正确性的概念来定义ASQA性能的自动指标。我们的分析表明了这一指标和人类判断之间的一致性,并揭示了人类绩效与强大基准之间的差距。

An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer questions that require in-depth explanations. The hurdles include (i) a lack of high-quality data, and (ii) the absence of a well-defined notion of the answer's quality. In this work, we address these problems by (i) releasing a novel dataset and a task that we call ASQA (Answer Summaries for Questions which are Ambiguous); and (ii) proposing a reliable metric for measuring performance on ASQA. Our task focuses on factoid questions that are ambiguous, that is, have different correct answers depending on interpretation. Answers to ambiguous questions should synthesize factual information from multiple sources into a long-form summary that resolves the ambiguity. In contrast to existing long-form QA tasks (such as ELI5), ASQA admits a clear notion of correctness: a user faced with a good summary should be able to answer different interpretations of the original ambiguous question. We use this notion of correctness to define an automated metric of performance for ASQA. Our analysis demonstrates an agreement between this metric and human judgments, and reveals a considerable gap between human performance and strong baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源