论文标题

预培训是(几乎)您需要的:

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

论文作者

Tamborrino, Alexandre, Pellicano, Nicola, Pannier, Baptiste, Voitot, Pascal, Naudin, Louise

论文摘要

预先训练的变压器模型的微调已成为解决常见NLP任务的标准方法。大多数现有方法都依赖于此类网络之上的随机初始化分类器。我们认为,这种微调过程是最佳选择的,因为预先训练的模型在特定的分类器标签上没有先验,而它可能已经学习了任务的内在文本表示。在本文中,我们引入了一种新的评分方法,该方法以全文格式施放合理的排名任务,并利用在训练阶段调整的蒙版语言建模头。我们研究常识性推理任务,其中该模型必须在给定前提下对一组假设进行排名,重点是Copa,Swag,Hellaswag和CommonsenseQA数据集。通过无需微调来利用我们的评分方法,我们就能生产出与监督方法相当的强大基准(例如,在COPA上的80%测试准确性)。此外,当直接对拟议的评分函数进行微调时,我们表明我们的方法在随机重新启动过程中提供了更稳定的训练阶段(例如$ \ times 10 $ 10 $ COPA测试准确性的标准偏差降低),并且比标准分类器方法更少的注释数据以达到等价性能。

Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. Most of the existing approaches rely on a randomly initialized classifier on top of such networks. We argue that this fine-tuning procedure is sub-optimal as the pre-trained model has no prior on the specific classifier labels, while it might have already learned an intrinsic textual representation of the task. In this paper, we introduce a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase. We study commonsense reasoning tasks where the model must rank a set of hypotheses given a premise, focusing on the COPA, Swag, HellaSwag and CommonsenseQA datasets. By exploiting our scoring method without fine-tuning, we are able to produce strong baselines (e.g. 80% test accuracy on COPA) that are comparable to supervised approaches. Moreover, when fine-tuning directly on the proposed scoring function, we show that our method provides a much more stable training phase across random restarts (e.g $\times 10$ standard deviation reduction on COPA test accuracy) and requires less annotated data than the standard classifier approach to reach equivalent performances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源