非参数掩盖语言建模

论文标题

非参数掩盖语言建模

Nonparametric Masked Language Modeling

论文作者

Min, Sewon, Shi, Weijia, Lewis, Mike, Chen, Xilun, Yih, Wen-tau, Hajishirzi, Hannaneh, Zettlemoyer, Luke

论文摘要

现有的语言模型（LMS）在有限的词汇上预测具有软磁性的令牌，这可能使得难以预测稀有令牌或短语。我们介绍了NPM，这是第一个非参数蒙版的语言模型，该模型用参考语料库中的每个短语中的非参数分布替换了此软磁性。 NPM仅通过从文本语料库中检索一个令牌来填充[蒙版]。我们表明，NPM可以通过对比度目标进行有效训练，并且可以与完整的语料库检索进行汇总近似。对包括分类，事实探测和问题答案在内的16个任务的零射门评估表明，有或没有检索和生成方法的情况下，NPM的表现要优于更大的参数模型。在处理罕见模式（单词感官或事实）并预测稀有或几乎看不见的单词（例如非拉丁脚本）方面，它特别更好。我们在github.com/facebookresearch/npm上发布模型和代码。

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题