衡量和改善对话生成的语义多样性

论文标题

衡量和改善对话生成的语义多样性

Measuring and Improving Semantic Diversity of Dialogue Generation

论文作者

Han, Seungju, Kim, Beomsu, Chang, Buru

论文摘要

响应多样性已成为评估开放域对话生成模型质量的重要标准。但是，当前对响应多样性的评估指标通常无法捕获生成的响应的语义多样性，因为它们主要考虑生成的响应的词汇方面。在本文中，我们引入了一个新的自动评估指标，以衡量产生的响应的语义多样性。通过人类评估，我们证明了我们提议的指标比现有的词汇级别多样性指标更好地捕捉了人类对响应多样性的判断。此外，通过分析现有的对话数据集，我们提出了一种简单而有效的学习方法，以改善产生的响应的语义多样性。我们的学习方法根据训练集的语义分布加权训练样本。我们表明，通过自动和人类评估，我们的学习方法比其他基线方法更好地提高了响应多样性和相干性。

Response diversity has become an important criterion for evaluating the quality of open-domain dialogue generation models. However, current evaluation metrics for response diversity often fail to capture the semantic diversity of generated responses, as they mainly consider lexical aspects of the generated responses. In this paper, we introduce a new automatic evaluation metric to measure the semantic diversity of generated responses. Through human evaluation, we demonstrate that our proposed metric captures human judgments on response diversity better than existing lexical-level diversity metrics. Furthermore, motivated by analyzing an existing dialogue dataset, we propose a simple yet effective learning method that improves the semantic diversity of generated responses. Our learning method weights training samples based on the semantic distribution of the training set. We show that our learning method improves response diversity and coherency better than other baseline methods through automatic and human evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题