论文标题

在总结长文档中,系统地探索冗余的减少

Systematically Exploring Redundancy Reduction in Summarizing Long Documents

论文作者

Xiao, Wen, Carenini, Giuseppe

论文摘要

我们对大型摘要数据集的分析表明,在总结长文档时,冗余是一个非常严重的问题。然而,在神经摘要中尚未彻底研究冗余。在这项工作中,我们系统地探索和比较了总结长文档时处理冗余的不同方法。具体而言,我们根据何时以及如何考虑冗余,将现有方法组织成类别。然后,在这些类别的背景下,我们提出了三种其他方法,以一般且灵活的方式平衡非差额和重要性。在一系列实验中,我们表明我们提出的方法在两个科学纸数据集(PubMed和arxiv)上获得了最新的胭脂分数,同时显着降低了冗余。

Our analysis of large summarization datasets indicates that redundancy is a very serious problem when summarizing long documents. Yet, redundancy reduction has not been thoroughly investigated in neural summarization. In this work, we systematically explore and compare different ways to deal with redundancy when summarizing long documents. Specifically, we organize the existing methods into categories based on when and how the redundancy is considered. Then, in the context of these categories, we propose three additional methods balancing non-redundancy and importance in a general and flexible way. In a series of experiments, we show that our proposed methods achieve the state-of-the-art with respect to ROUGE scores on two scientific paper datasets, Pubmed and arXiv, while reducing redundancy significantly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源