学习简化数据，无可救药

论文标题

学习简化数据，无可救药

Learning to Simplify with Data Hopelessly Out of Alignment

论文作者

Nomoto, Tadashi

论文摘要

我们考虑是否可以在不依靠“平行”语料库的情况下进行简化文本，该语料由复杂和地面真实句子的句子对齐组成。为此，我们介绍了许多概念，有些是新的，有些则没有，包括我们称之为连接的双网络，触发器自动编码器（FFA）和对抗性网络（GAN）。詹森·香农（JS-Gan）和瓦斯斯特恩·甘（Jensen Shannon）和瓦斯坦斯坦·甘（Wasserstein Gan）进行了比较，以查看它们如何影响性能，并为前者带来更强的结果。我们使用来自Wikipedia的大数据集进行的实验发现，与当前最佳性能系统相比，配备了FFA和JS-GAN的双网络的稳固优势。此外，我们讨论了与过去文献中完全监督方法的关系，并强调了示例在无监督系统生成的简化句子中存在的定性差异。

We consider whether it is possible to do text simplification without relying on a "parallel" corpus, one that is made up of sentence-by-sentence alignments of complex and ground truth simple sentences. To this end, we introduce a number of concepts, some new and some not, including what we call Conjoined Twin Networks, Flip-Flop Auto-Encoders (FFA) and Adversarial Networks (GAN). A comparison is made between Jensen-Shannon (JS-GAN) and Wasserstein GAN, to see how they impact performance, with stronger results for the former. An experiment we conducted with a large dataset derived from Wikipedia found the solid superiority of Twin Networks equipped with FFA and JS-GAN, over the current best performing system. Furthermore, we discuss where we stand in a relation to fully supervised methods in the past literature, and highlight with examples qualitative differences that exist among simplified sentences generated by supervision-free systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题