论文标题

等级:用于评估开放域对话系统的自动图增强相干度量

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

论文作者

Huang, Lishan, Ye, Zheng, Qin, Jinghui, Lin, Liang, Liang, Xiaodan

论文摘要

自动评估对话连贯性是开发高质量开放域对话系统的具有挑战性但高需求的能力。但是,当前的评估指标仅考虑表面特征或话语级语义,而无需明确考虑对话流的细粒度主题过渡动力学。在这里,我们首先考虑到对话中由主题构成的图形结构可以准确地描述潜在的通信逻辑,这是产生有说服力指标的更自然的方法。我们提出了一个新的评估度量等级,该评估级别为主题级对话图,该评估级代表自动对话评估的图形增强表示。具体而言,等级既包含了粗粒的话语级上下文化表示形式,又结合了细粒度的主题级图表示,以评估对话连贯性。图表是通过对主题级的对话图进行推理,并用常识图的证据(包括K-HOP相邻表示和跳跃权重)来获得图表。实验结果表明,我们的成绩明显胜过其他最先进的指标,该指标在皮尔逊和斯皮尔曼与人类判断的相关性方面衡量多样化的对话模型。此外,我们发布了新的大型人类评估基准,以促进对自动指标的未来研究。

Automatically evaluating dialogue coherence is a challenging but high-demand ability for developing high-quality open-domain dialogue systems. However, current evaluation metrics consider only surface features or utterance-level semantics, without explicitly considering the fine-grained topic transition dynamics of dialogue flows. Here, we first consider that the graph structure constituted with topics in a dialogue can accurately depict the underlying communication logic, which is a more natural way to produce persuasive metrics. Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. The graph representations are obtained by reasoning over topic-level dialogue graphs enhanced with the evidence from a commonsense graph, including k-hop neighboring representations and hop-attention weights. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models in terms of the Pearson and Spearman correlations with human judgements. Besides, we release a new large-scale human evaluation benchmark to facilitate future research on automatic metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源