论文标题
OPT-IML:扩展语言模型指令通过概括的镜头学习元学习
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
论文作者
论文摘要
最近的工作表明,通过指示又称指令进行微调的大型预训练的语言模型,改进了其零且几乎没有概括,以提高看不见的任务。但是,对指导过程中不同决策的绩效权衡的理解有限。这些决定包括指导调整基准测试的规模和多样性,不同的任务抽样策略,带有和没有演示的微调,使用专业数据进行推理和对话的培训,最后是微调目标本身。在本文中,我们表征了指导调整决策对缩放模型和基准尺寸时下游任务性能的影响。为此,我们创建了Opt-IML台:用于指导元学习(IML)的2000 NLP任务的大型基准测试,从8个现有基准中整合到任务类别中,并准备一个评估框架来衡量三种类型的模型概括:从完全持有类别的任务中,从可见的类别中遵守的任务,以及从可见的算法中遵守任务,以及从可见的任务中进行了任务。通过该框架的镜头,我们首先提供了有关指导调查决策的见解,并将这些见解进一步利用这些见解来训练Opt-IML 30B和175B,这是OPT的指令调整版本。 Opt-IML在两个不同的评估基准上都以不同的任务和输入格式展示了两个量表上的所有三个概括能力 - 提示,弗朗,超纯净结构和Unifiendskg。它不仅显着优于所有基准测试,而且还具有在每个特定基准上进行微调的现有模型的竞争力。我们在两个尺度上释放OPT-IML,以及OPT-IML基准评估框架。
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.