知识渊博的显着跨度面具，以增强语言模型作为知识库

论文标题

知识渊博的显着跨度面具，以增强语言模型作为知识库

Knowledgeable Salient Span Mask for Enhancing Language Models as Knowledge Base

论文作者

Wang, Cunxiang, Luo, Fuli, Li, Yanyang, Xu, Runxin, Huang, Fei, Zhang, Yue

论文摘要

像Bert这样的预训练的语言模型（PLM）在各种下游NLP任务中取得了重大进展。但是，通过要求模型进行固定风格的测试，最近的工作发现，PLM在从非结构化文本中获取知识方面很短。为了理解PLM在检索知识时的内部行为，我们首先定义知识奖励（K-B）令牌和无知识（K-F）代币的非结构化文本，并要求专业注释者手动标记一些样本。然后，我们发现PLM更有可能对K-B代币进行错误的预测，而对自我发项式模块中的令牌进行了更少的关注。基于这些观察结果，我们开发了两种解决方案，以完全自欺欺人的方式从非结构化的文本中学习更多知识。知识密集型任务的实验显示了提出的方法的有效性。据我们最大的知识，我们是第一个探索在不断培训中对知识学习的完全自我监督的人。

Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, by asking models to do cloze-style tests, recent work finds that PLMs are short in acquiring knowledge from unstructured text. To understand the internal behaviour of PLMs in retrieving knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free (K-F) tokens for unstructured text and ask professional annotators to label some samples manually. Then, we find that PLMs are more likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. Experiments on knowledge-intensive tasks show the effectiveness of the proposed methods. To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题