论文标题

要理解对话摘要中的遗漏

Towards Understanding Omission in Dialogue Summarization

论文作者

Zou, Yicheng, Song, Kaitao, Tan, Xu, Fu, Zhongkai, Zhang, Qi, Li, Dongsheng, Gui, Tao

论文摘要

对话摘要旨在将冗长的对话融合到简洁的摘要中,并最近取得了重大进展。但是,现有方法的结果远非令人满意。以前的工作表明,遗漏是影响摘要质量的主要因素,但是很少有人进一步探讨了遗漏问题,例如遗漏如何影响摘要结果以及如何检测遗漏,这对于降低省略和提高摘要质量至关重要。此外,分析和检测遗漏的依赖于带有遗漏标签的摘要数据集(即,摘要中省略了对话说法),这些数据在当前文献中不可用。在本文中,我们提出了Olds数据集,该数据集为对话摘要提供了高质量的遗漏标签。通过分析该数据集,我们发现可以通过为摘要模型提供基础真相标签来恢复遗漏信息,从而实现摘要质量的大幅度改进,这证明了省略检测对于省略缓解对话摘要的重要性。因此,我们制定了省略检测任务,并证明我们提出的数据集可以很好地支持对该任务的培训和评估。我们还根据提议的数据集要求对遗漏检测采取研究行动。我们的数据集和代码公开可用。

Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源