迈向可解释的指标，用于解释AMR的文本生成

论文标题

迈向可解释的指标，用于解释AMR的文本生成

Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR

论文作者

Opitz, Juri, Frank, Anette

论文摘要

通常使用自动表面匹配指标来评估从抽象含义表示（例如AMR）产生自然语言文本的系统，该指标将生成的文本与参考文本进行了构建输入含义表示形式的参考文本。我们表明，除了这些指标遭受的众所周知的问题外，将这些指标应用于AMR到文本评估时还会出现另一个问题，因为抽象含义表示可以实现许多表面实现。在这项工作中，我们旨在通过提出$ \ MATHCAL {M} \ MATHCAL {F}_β$来减轻这些问题，这是一种基于两个支柱的可分解指标。第一个是含义保存的原则$ \ MATHCAL {M} $：它可以使用SOTA AMR Parsers从生成的句子中重建给定的AMR，并应用（细粒度的）AMR评估指标来测量原始AMR和构造的AMR之间的距离。第二个支柱建立在（语法）形式的$ \ Mathcal {f} $的原理上，该原理测量了生成的文本的语言质量，我们使用SOTA语言模型实现了该文本。在两项广泛的试点研究中，我们表明，实现这两种原则为AMR到文本评估提供了好处，包括分数的解释性。由于$ \ MATHCAL {M} \ MATHCAL {F}_β$不一定依赖于Gold AMRS，因此它可能会扩展到其他文本生成任务。

Systems that generate natural language text from abstract meaning representations such as AMR are typically evaluated using automatic surface matching metrics that compare the generated texts to reference texts from which the input meaning representations were constructed. We show that besides well-known issues from which such metrics suffer, an additional problem arises when applying these metrics for AMR-to-text evaluation, since an abstract meaning representation allows for numerous surface realizations. In this work we aim to alleviate these issues by proposing $\mathcal{M}\mathcal{F}_β$, a decomposable metric that builds on two pillars. The first is the principle of meaning preservation $\mathcal{M}$: it measures to what extent a given AMR can be reconstructed from the generated sentence using SOTA AMR parsers and applying (fine-grained) AMR evaluation metrics to measure the distance between the original and the reconstructed AMR. The second pillar builds on a principle of (grammatical) form $\mathcal{F}$ that measures the linguistic quality of the generated text, which we implement using SOTA language models. In two extensive pilot studies we show that fulfillment of both principles offers benefits for AMR-to-text evaluation, including explainability of scores. Since $\mathcal{M}\mathcal{F}_β$ does not necessarily rely on gold AMRs, it may extend to other text generation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题