论文标题

非参数掩盖语言建模

Nonparametric Masked Language Modeling

论文作者

Min, Sewon, Shi, Weijia, Lewis, Mike, Chen, Xilun, Yih, Wen-tau, Hajishirzi, Hannaneh, Zettlemoyer, Luke

论文摘要

现有的语言模型(LMS)在有限的词汇上预测具有软磁性的令牌,这可能使得难以预测稀有令牌或短语。我们介绍了NPM,这是第一个非参数蒙版的语言模型,该模型用参考语料库中的每个短语中的非参数分布替换了此软磁性。 NPM仅通过从文本语料库中检索一个令牌来填充[蒙版]。我们表明,NPM可以通过对比度目标进行有效训练,并且可以与完整的语料库检索进行汇总近似。对包括分类,事实探测和问题答案在内的16个任务的零射门评估表明,有或没有检索和生成方法的情况下,NPM的表现要优于更大的参数模型。在处理罕见模式(单词感官或事实)并预测稀有或几乎看不见的单词(例如非拉丁脚本)方面,它特别更好。我们在github.com/facebookresearch/npm上发布模型和代码。

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源