通过模型 - 不合稳定的增强育儿学习以纠正数据到文本生成中的病理行为

论文标题

通过模型 - 不合稳定的增强育儿学习以纠正数据到文本生成中的病理行为

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

论文作者

Rebuffel, Clément, Soulier, Laure, Scoutheeten, Geoffrey, Gallinari, Patrick

论文摘要

在以结构化数据为条件的语言生成模型中，通过最大可能性进行的经典培训几乎总是导致模型在数据集发散（即幻觉或遗漏）上拾取，并在推理中错误地将其合并到自己的一代中。在这项工作中，我们建立了以前基于加强学习的方法的Ontop，并表明依赖于最近引入的家长指标的模型无关框架有效地减少了幻觉和遗漏。对广泛使用的Wikibio和WebNLG基准测试的评估证明了该框架与最新模型相比的有效性。

In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题