论文标题
封闭式和无法回答的问题多类型的对话问题回答的生成
Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions
论文作者
论文摘要
会话问题回答(CQA)有助于对给定上下文的增量和互动性理解,但是由于数据稀缺问题,许多领域很难构建CQA系统。在本文中,我们介绍了一种新颖的方法,可以通过各种问题类型(包括开放式,封闭式和无法回答的问题)合成CQA数据。我们为每种问题类型设计一个不同的生成流,并有效地将它们组合到一个共享的框架中。此外,我们设计了一个分层答案性分类(分层AC)模块,该模块在获取无法回答的问题的同时提高了合成数据的质量。手动检查表明,使用我们的框架生成的合成数据具有与人类生成的对话非常相似的特征。在四个领域,接受过我们合成数据培训的CQA系统确实显示出良好的性能,接近接受人类通知数据的系统。
Conversational question answering (CQA) facilitates an incremental and interactive understanding of a given context, but building a CQA system is difficult for many domains due to the problem of data scarcity. In this paper, we introduce a novel method to synthesize data for CQA with various question types, including open-ended, closed-ended, and unanswerable questions. We design a different generation flow for each question type and effectively combine them in a single, shared framework. Moreover, we devise a hierarchical answerability classification (hierarchical AC) module that improves quality of the synthetic data while acquiring unanswerable questions. Manual inspections show that synthetic data generated with our framework have characteristics very similar to those of human-generated conversations. Across four domains, CQA systems trained on our synthetic data indeed show good performance close to the systems trained on human-annotated data.