MME-CRS：基于重新缩放的相关性评估多项式评估，以评估开放域对话

论文标题

MME-CRS：基于重新缩放的相关性评估多项式评估，以评估开放域对话

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

论文作者

Zhang, Pengfei, Hu, Xiaohui, Yu, Kaidong, Wang, Jian, Han, Song, Liu, Cao, Yuan, Chunyang

论文摘要

自动开放域对话评估是对话系统的关键组成部分。最近，基于学习的评估指标在开放域对话评估中取得了最先进的表现。但是，这些仅关注一些品质的指标很难全面评估对话。此外，这些指标缺乏有效的分数组成方法，无法获得各种评估质量。为了解决上述问题，我们提出了基于相关性重新缩放（MME-CR）的多项式评估，以评估开放域对话。首先，我们建立了一个评估度量，该评估度量由5组平行的子对象组成，称为多金属评估（MME），以全面评估对话的质量。此外，我们提出了一种称为相关重新缩放（CRS）的新型分数组成方法，以模拟子计量与多样性之间的关系。我们的方法MME-CRS在DSTC10 TRACK5 SUBTASK1自动开放域对话评估挑战的最终测试数据中排名第一，这证明了我们提出的方法的有效性。

Automatic open-domain dialogue evaluation is a crucial component of dialogue systems. Recently, learning-based evaluation metrics have achieved state-of-the-art performance in open-domain dialogue evaluation. However, these metrics, which only focus on a few qualities, are hard to evaluate dialogue comprehensively. Furthermore, these metrics lack an effective score composition approach for diverse evaluation qualities. To address the above problems, we propose a Multi-Metric Evaluation based on Correlation Re-Scaling (MME-CRS) for evaluating open-domain dialogue. Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively. Furthermore, we propose a novel score composition method called Correlation Re-Scaling (CRS) to model the relationship between sub-metrics and diverse qualities. Our approach MME-CRS ranks first on the final test data of DSTC10 track5 subtask1 Automatic Open-domain Dialogue Evaluation Challenge with a large margin, which proved the effectiveness of our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题