告诉我如何再次询问：在连续空间中可控制的重写问题数据增加

论文标题

告诉我如何再次询问：在连续空间中可控制的重写问题数据增加

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

论文作者

Liu, Dayiheng, Gong, Yeyun, Fu, Jie, Yan, Yu, Chen, Jiusheng, Lv, Jiancheng, Duan, Nan, Zhou, Ming

论文摘要

在本文中，我们提出了一种新颖的数据增强方法，称为基于可控的重写问题数据增强（CRQDA），用于机器阅读理解（MRC），问题生成和提问的自然语言推理任务。我们将问题数据增强任务视为一个有限的问题重写问题，以生成背景，高质量和多样化的问题数据样本。 CRQDA利用变压器自动编码器将原始离散问题映射到连续的嵌入空间中。然后，它使用预先训练的MRC模型通过基于梯度的优化在迭代中进行修改。最后，修订后的问题表示形式被映射到离散空间中，这些空间是其他问题数据。关于小队2.0，小队1.1问题生成和QNLI任务的全面实验证明了CRQDA的有效性

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA

下载PDF全文

下载文献需遵守相关版权规定

论文标题