微调语言模型是持续的学习者

论文标题

微调语言模型是持续的学习者

Fine-tuned Language Models are Continual Learners

论文作者

Scialom, Thomas, Chakrabarty, Tuhin, Muresan, Smaranda

论文摘要

关于大型语言模型的最新工作依赖于可以通过自然语言指示来描述大多数自然语言处理任务的直觉。在这些说明中训练的语言模型在几个标准数据集上显示出强烈的零弹性性能。但是，这些模型即使令人印象深刻，在各自培训和评估集之外的各种任务上仍然表现差。为了解决这一限制，我们认为模型应该能够继续扩展其知识和能力，而不会忘记以前的技能。尽管不断学习的成功有限，但我们表明语言模型可以是持续学习者。我们从经验上研究了这一成功的原因，并得出结论，即持续的学习是从自学预裁的预训练中出现的。我们由此产生的模型连续T0（CT0）能够学习各种新任务，同时仍然在以前的任务上保持良好的性能，总共通过70个数据集跨越了良好的范围。最后，我们表明CT0能够以从未经过培训的方式结合说明，证明了某些构图。

Recent work on large language models relies on the intuition that most natural language processing tasks can be described via natural language instructions. Language models trained on these instructions show strong zero-shot performance on several standard datasets. However, these models even though impressive still perform poorly on a wide range of tasks outside of their respective training and evaluation sets. To address this limitation, we argue that a model should be able to keep extending its knowledge and abilities, without forgetting previous skills. In spite of the limited success of Continual Learning we show that Language Models can be continual learners. We empirically investigate the reason for this success and conclude that Continual Learning emerges from self-supervision pre-training. Our resulting model Continual-T0 (CT0) is able to learn diverse new tasks, while still maintaining good performance on previous tasks, spanning remarkably through 70 datasets in total. Finally, we show that CT0 is able to combine instructions in ways it was never trained for, demonstrating some compositionality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题