基于事实的文本编辑

论文标题

基于事实的文本编辑

Fact-based Text Editing

论文作者

Iso, Hayate, Qiao, Chao, Li, Hang

论文摘要

我们提出了一项新颖的文本编辑任务，称为\ textit {基于事实的文本编辑}，其中的目标是修改给定文档，以更好地描述知识库中的事实（例如，几个三倍）。该任务在实践中很重要，因为反映真相是文本编辑中的共同要求。首先，我们提出了一种自动生成数据集的方法，以研究基于事实的文本编辑，其中每个实例都由文本草案，修订文本和几个事实组成。我们将该方法应用于两个公共表到文本数据集中，分别获得了两个新数据集，分别由233K和37K实例组成。接下来，我们为基于事实的文本编辑提出了一个新的神经网络体系结构，称为\ textsc {tenteditor}，该架构通过使用缓冲区，流和内存来介绍给定的事实来编辑文本草案。解决该问题的直接方法是采用编码器模型。我们在两个数据集上的实验结果表明，\ textsc {toxeditor}在忠诚度和流利度方面优于编码器删除方法。结果还表明，\ textsc {tenteditor}比编码器方法更快地进行推理。

We propose a novel text editing task, referred to as \textit{fact-based text editing}, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-based text editing, where each instance consists of a draft text, a revised text, and several facts represented in triples. We apply the method into two public table-to-text datasets, obtaining two new datasets consisting of 233k and 37k instances, respectively. Next, we propose a new neural network architecture for fact-based text editing, called \textsc{FactEditor}, which edits a draft text by referring to given facts using a buffer, a stream, and a memory. A straightforward approach to address the problem would be to employ an encoder-decoder model. Our experimental results on the two datasets show that \textsc{FactEditor} outperforms the encoder-decoder approach in terms of fidelity and fluency. The results also show that \textsc{FactEditor} conducts inference faster than the encoder-decoder approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题