炒作：更好的预训练语言模型微调和隐藏表示扰动

论文标题

炒作：更好的预训练语言模型微调和隐藏表示扰动

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

论文作者

Yuan, Hongyi, Yuan, Zheng, Tan, Chuanqi, Huang, Fei, Huang, Songfang

论文摘要

具有变压器结构的语言模型在自然语言处理中表现出色。但是，在下游任务（例如过度拟合或表示形式崩溃）上微调预训练的语言模型时，仍然会构成问题。在这项工作中，我们提出了炒作，这是一种简单而有效的微调技术，可以通过扰动变压器层的隐藏表示来减轻此类问题。与以前仅在输入或参数中添加噪声的作品不同，我们认为变形金刚的隐藏表示形式传达了更多样化和有意义的语言信息。因此，使变压器对隐藏的表示扰动更加健壮，可以进一步受益于PLMS EN BLOC的微调。我们对胶水和其他自然语言推断数据集进行了广泛的实验和分析。结果表明，炒作的表现优于香草微调，并增强了来自不同层的隐藏表示形式的概括。此外，HYPE获得了可忽略的计算开销，并且比以前最新的微调技术兼容。

Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题