神经标签搜索零拍的多语言提取性摘要

论文标题

神经标签搜索零拍的多语言提取性摘要

Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization

论文作者

Jia, Ruipeng, Zhang, Xingxing, Cao, Yanan, Wang, Shi, Lin, Zheng, Wei, Furu

论文摘要

在零摄像的多语言提取文本摘要中，通常在英语摘要数据集上训练模型，然后在其他语言的摘要数据集上应用。给定英语黄金摘要和文件，通常使用启发式方法生成用于提取性摘要的句子级标签。但是，在英语数据集上创建的这些单语标签可能在其他语言的数据集上可能不是最佳的，因为不同语言之间存在句法或语义差异。这样，可以将英语数据集转换为其他语言，并使用启发式方法再次获得不同的标签。为了充分利用这些不同标签集的信息，我们建议NLSSUM（神经标签搜索摘要），该标签共同学习这些不同的标签集的层次重量以及我们的摘要模型。我们在MLSUM和Wikilingua数据集上进行了多语言的零镜头摘要实验，并使用这两个数据集中的人类和自动评估实现了最新的结果。

In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages. Given English gold summaries and documents, sentence-level labels for extractive summarization are usually generated using heuristics. However, these monolingual labels created on English datasets may not be optimal on datasets of other languages, for that there is the syntactic or semantic discrepancy between different languages. In this way, it is possible to translate the English dataset to other languages and obtain different sets of labels again using heuristics. To fully leverage the information of these different sets of labels, we propose NLSSum (Neural Label Search for Summarization), which jointly learns hierarchical weights for these different sets of labels together with our summarization model. We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations across these two datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题