论文标题
summpip:无监督的多文件摘要和句子图压缩
SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression
论文作者
论文摘要
获得多文章摘要(MDS)的培训数据是耗时且资源密集的,因此最近的神经模型只能接受有限域的培训。在本文中,我们提出了summpip:一种无监督的多文章摘要方法,在该方法中,我们将原始文档转换为句子图,同时考虑语言和深度表示,然后应用光谱集群以获得多个句子,并最终压缩每个群集以生成最终的总结。对多名和DUC-2004数据集的实验表明,我们的方法与以前的无监督方法具有竞争力,甚至与神经监督方法相媲美。此外,人类评估表明,与人类书面相比,我们的系统产生了一致和完整的摘要。
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.