论文标题

学习简化数据,无可救药

Learning to Simplify with Data Hopelessly Out of Alignment

论文作者

Nomoto, Tadashi

论文摘要

我们考虑是否可以在不依靠“平行”语料库的情况下进行简化文本,该语料由复杂和地面真实句子的句子对齐组成。为此,我们介绍了许多概念,有些是新的,有些则没有,包括我们称之为连接的双网络,触发器自动编码器(FFA)和对抗性网络(GAN)。詹森·香农(JS-Gan)和瓦斯斯特恩·甘(Jensen Shannon)和瓦斯坦斯坦·甘(Wasserstein Gan)进行了比较,以查看它们如何影响性能,并为前者带来更强的结果。我们使用来自Wikipedia的大数据集进行的实验发现,与当前最佳性能系统相比,配备了FFA和JS-GAN的双网络的稳固优势。此外,我们讨论了与过去文献中完全监督方法的关系,并强调了示例在无监督系统生成的简化句子中存在的定性差异。

We consider whether it is possible to do text simplification without relying on a "parallel" corpus, one that is made up of sentence-by-sentence alignments of complex and ground truth simple sentences. To this end, we introduce a number of concepts, some new and some not, including what we call Conjoined Twin Networks, Flip-Flop Auto-Encoders (FFA) and Adversarial Networks (GAN). A comparison is made between Jensen-Shannon (JS-GAN) and Wasserstein GAN, to see how they impact performance, with stronger results for the former. An experiment we conducted with a large dataset derived from Wikipedia found the solid superiority of Twin Networks equipped with FFA and JS-GAN, over the current best performing system. Furthermore, we discuss where we stand in a relation to fully supervised methods in the past literature, and highlight with examples qualitative differences that exist among simplified sentences generated by supervision-free systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源