带有Oracle期望的文本摘要

论文标题

带有Oracle期望的文本摘要

Text Summarization with Oracle Expectation

论文作者

Xu, Yumo, Lapata, Mirella

论文摘要

提取性摘要通过识别和串联文档中最重要的句子来产生摘要。由于大多数摘要数据集都没有带有指示文档句子是否值得摘要的金标签，因此已经提出了不同的标签算法来推断甲骨文提取物进行模型培训。在这项工作中，我们以广泛使用的贪婪标签方法来识别两个缺陷：它提供了次优和确定性的甲骨文。为了减轻这两个问题，我们提出了一种简单而有效的标签算法，该算法会产生柔和的，基于期望的句子标签。我们为提取性摘要定义了一个新的学习目标，该目标将来自多个Oracle摘要的学习信号结合在一起，并证明这等同于估计每个文档句子的Oracle期望。在没有任何架构修改的情况下，建议的标签方案在跨域和语言的各种摘要基准上都可以在监督和零拍设置中获得卓越的性能。

Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Since most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy, different labeling algorithms have been proposed to extrapolate oracle extracts for model training. In this work, we identify two flaws with the widely used greedy labeling approach: it delivers suboptimal and deterministic oracles. To alleviate both issues, we propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels. We define a new learning objective for extractive summarization which incorporates learning signals from multiple oracle summaries and prove it is equivalent to estimating the oracle expectation for each document sentence. Without any architectural modifications, the proposed labeling scheme achieves superior performance on a variety of summarization benchmarks across domains and languages, in both supervised and zero-shot settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题