指标引导的蒸馏：将知识从度量标准到Ranker的知识和生成常识性推理的检索

论文标题

指标引导的蒸馏：将知识从度量标准到Ranker的知识和生成常识性推理的检索

Metric-guided Distillation: Distilling Knowledge from the Metric to Ranker and Retriever for Generative Commonsense Reasoning

论文作者

He, Xingwei, Gong, Yeyun, Jin, A-Long, Qi, Weizhen, Zhang, Hang, Jiao, Jian, Zhou, Bartuer, Cheng, Biao, Yiu, SM, Duan, Nan

论文摘要

常识生成旨在生成一个逼真的句子，描述给定概念下的日常场景，这非常具有挑战性，因为它要求模型具有关系推理和组成概括能力。以前的工作着重于检索提供的概念以帮助生成的原型句子。他们首先使用稀疏的猎犬检索候选句子，然后将候选人与排名重新排列。但是，由其排名者退回的候选人可能不是最相关的句子，因为排名者同样对所有候选人进行了同样的对待，而无需考虑其与给定概念的参考句子的相关性。另一个问题是，重新排列非常昂贵，但是只有使用猎犬才会严重降低其一代模型的性能。为了解决这些问题，我们建议度量蒸馏规则将知识从度量（例如BLEU）提炼为排名。我们进一步将蒸馏级排名者总结的批判知识转移到猎犬。这样，排名者和猎犬预测的候选句子的相关性得分将与指标衡量的质量更加一致。公共基准测试的实验结果验证了我们提出的方法的有效性：（1）使用蒸馏器排名者的生成模型实现了新的最新结果。（2）使用蒸馏犬的我们的一代模型甚至超过了先前的SOTA。

Commonsense generation aims to generate a realistic sentence describing a daily scene under the given concepts, which is very challenging, since it requires models to have relational reasoning and compositional generalization capabilities. Previous work focuses on retrieving prototype sentences for the provided concepts to assist generation. They first use a sparse retriever to retrieve candidate sentences, then re-rank the candidates with a ranker. However, the candidates returned by their ranker may not be the most relevant sentences, since the ranker treats all candidates equally without considering their relevance to the reference sentences of the given concepts. Another problem is that re-ranking is very expensive, but only using retrievers will seriously degrade the performance of their generation models. To solve these problems, we propose the metric distillation rule to distill knowledge from the metric (e.g., BLEU) to the ranker. We further transfer the critical knowledge summarized by the distilled ranker to the retriever. In this way, the relevance scores of candidate sentences predicted by the ranker and retriever will be more consistent with their quality measured by the metric. Experimental results on the CommonGen benchmark verify the effectiveness of our proposed method: (1) Our generation model with the distilled ranker achieves a new state-of-the-art result. (2) Our generation model with the distilled retriever even surpasses the previous SOTA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题