论文标题

缩放指令 - 通信语言模型

Scaling Instruction-Finetuned Language Models

论文作者

Chung, Hyung Won, Hou, Le, Longpre, Shayne, Zoph, Barret, Tay, Yi, Fedus, William, Li, Yunxuan, Wang, Xuezhi, Dehghani, Mostafa, Brahma, Siddhartha, Webson, Albert, Gu, Shixiang Shane, Dai, Zhuyun, Suzgun, Mirac, Chen, Xinyun, Chowdhery, Aakanksha, Castro-Ros, Alex, Pellat, Marie, Robinson, Kevin, Valter, Dasha, Narang, Sharan, Mishra, Gaurav, Yu, Adams, Zhao, Vincent, Huang, Yanping, Dai, Andrew, Yu, Hongkun, Petrov, Slav, Chi, Ed H., Dean, Jeff, Devlin, Jacob, Roberts, Adam, Zhou, Denny, Le, Quoc V., Wei, Jason

论文摘要

显示出指令的数据集集合中的芬语语言模型已被证明可以改善模型性能和概括性,以使其成为看不见的任务。在本文中,我们探讨了指令填充,特别关注(1)缩放任务的数量,(2)缩放模型大小,以及(3)对经过经过经过经过经过经过经过经过经验的数据的填充。我们发现,上述方面的指导填充可以极大地提高各种模型类(Palm,T5,U-PALM)的性能,促使设置(零射击,很少射击,COT)和评估基准(MMLU,BBH,Tydiqa,MGSM,MGSM,开放式一代)。例如,Flan-Palm 540B指令在1.8K任务上进行了指导,超过540B的较大边距(平均+9.4%)。 Flan-Palm 540B在多个基准测试中实现最先进的性能,例如五杆MMLU的75.2%。我们还公开发布了Flan-T5检查站,即使与Palm 62B相比,即使与大型型号相比,它的表现也很强。总体而言,指导填充是一种提高审计语言模型的性能和可用性的通用方法。

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源