训练没有邪恶：任务指导预训练的选择性掩盖

论文标题

训练没有邪恶：任务指导预训练的选择性掩盖

Train No Evil: Selective Masking for Task-Guided Pre-Training

论文作者

Gu, Yuxian, Zhang, Zhengyan, Wang, Xiaozhi, Liu, Zhiyuan, Sun, Maosong

论文摘要

最近，预训练的语言模型主要遵循预先训练的范式，并在各种下游任务上取得了出色的表现。但是，由于预训练阶段通常是任务不合时宜的，并且微调阶段通常遭受监督数据不足，因此模型不能总是很好地捕获特定于域的特定于域的模式。在本文中，我们通过添加任务引导的预训练阶段，在一般的预训练和微调之间进行选择性掩盖，提出一个三阶段的框架。在此阶段，该模型是通过对内域无监督数据的掩盖语言建模来训练的，以学习特定于领域的模式，我们提出了一种新颖的选择性掩盖策略来学习特定于任务的模式。具体而言，我们设计了一种测量每个令牌在序列中的重要性并选择性地掩盖重要令牌的方法。两个情绪分析任务的实验结果表明，我们的方法可以在少于50％的计算成本中实现可比甚至更好的性能，这表明我们的方法既有效又有效。本文的源代码可以从https://github.com/thunlp/selectivemasking获得。

Recently, pre-trained language models mostly follow the pre-train-then-fine-tuning paradigm and have achieved great performance on various downstream tasks. However, since the pre-training stage is typically task-agnostic and the fine-tuning stage usually suffers from insufficient supervised data, the models cannot always well capture the domain-specific and task-specific patterns. In this paper, we propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. In this stage, the model is trained by masked language modeling on in-domain unsupervised data to learn domain-specific patterns and we propose a novel selective masking strategy to learn task-specific patterns. Specifically, we design a method to measure the importance of each token in sequences and selectively mask the important tokens. Experimental results on two sentiment analysis tasks show that our method can achieve comparable or even better performance with less than 50% of computation cost, which indicates our method is both effective and efficient. The source code of this paper can be obtained from https://github.com/thunlp/SelectiveMasking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题