基于剩余的文本模型

论文标题

基于剩余的文本模型

Residual Energy-Based Models for Text

论文作者

Bakhtin, Anton, Deng, Yuntian, Gross, Sam, Ott, Myle, Ranzato, Marc'Aurelio, Szlam, Arthur

论文摘要

当前的大规模自动回归语言模型表现出令人印象深刻的流利性，并可以产生令人信服的文本。在这项工作中，我们首先提出一个问题：这些模型的几代可以通过统计歧视者可靠地与真实文本区分开来？我们从实验上发现，当我们可以访问模型的培训数据时，答案是肯定的，即使我们不这样做，答案也是肯定的。这表明可以通过将（全球标准化的）歧视因子纳入生成过程来改进自动回归模型。我们使用基于能量的模型框架为此提供了形式主义，并表明它确实改善了生成模型的结果，这既可以根据困惑性和人类评估来衡量。

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmative even if we do not. This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process. We give a formalism for this using the Energy-Based Model framework, and show that it indeed improves the results of the generative models, measured both in terms of perplexity and in terms of human evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题