使用域传输和数据合成的低资源数据的抽象性摘要

论文标题

使用域传输和数据合成的低资源数据的抽象性摘要

Abstractive Summarization for Low Resource Data using Domain Transfer and Data Synthesis

论文作者

Magooda, Ahmed, Litman, Diane

论文摘要

培训抽象性摘要模型通常需要大量数据，这可能是许多领域的限制。在本文中，我们使用域传输和数据综合探索，以提高应用于学生反思的小型语料库时的最新抽象摘要方法的性能。首先，我们探讨了对报纸数据培训的最新模型的调整是否可以提高学生反思数据的性能。评估表明，与仅在学生反思数据或报纸数据的模型相比，调谐模型产生的摘要获得了更高的胭脂分数。与提取性摘要基线相比，调谐模型还获得了更高的分数，并且还被判断为在人类评估中产生更连贯和可读的摘要。其次，我们探讨了合成学生数据的摘要是否可以额外提高性能。我们提出了一个基于模板的模型来合成新数据，该数据合并到训练中后，进一步提高了胭脂分数。最后，我们表明，将数据合成与域传递相比，与仅使用两种方法之一相比，将数据合成的分数更高。

Training abstractive summarization models typically requires large amounts of data, which can be a limitation for many domains. In this paper we explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods when applied to small corpora of student reflections. First, we explored whether tuning state of the art model trained on newspaper data could boost performance on student reflection data. Evaluations demonstrated that summaries produced by the tuned model achieved higher ROUGE scores compared to model trained on just student reflection data or just newspaper data. The tuned model also achieved higher scores compared to extractive summarization baselines, and additionally was judged to produce more coherent and readable summaries in human evaluations. Second, we explored whether synthesizing summaries of student data could additionally boost performance. We proposed a template-based model to synthesize new data, which when incorporated into training further increased ROUGE scores. Finally, we showed that combining data synthesis with domain transfer achieved higher ROUGE scores compared to only using one of the two approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题