论文标题
积极的示例选择中下文学习
Active Example Selection for In-Context Learning
论文作者
论文摘要
通过少数演示示例,大规模的语言模型表现出强大的能力,可以通过从这些示例中学习来执行各种任务,而无需进行任何微调。我们证明,在示例样本中,文章学习绩效可能是高度不稳定的,这表明语言模型如何获取信息的特质。我们为秘密学习作为顺序决策问题制定示例选择,并提出了一种强化学习算法,以识别可概括示例的可推广策略。对于GPT-2,我们的学识渊博的政策表明,在培训中没有看到的任务,平均提高了5.8美元的培训能力。从我们学到的政策中选择的示例甚至可以对GPT-3 ADA实现很小的改进。但是,改进会减少较大的GPT-3模型,这表明大语模型的新兴功能。
With a handful of demonstration examples, large-scale language models show strong capability to perform various tasks by in-context learning from these examples, without any fine-tuning. We demonstrate that in-context learning performance can be highly unstable across samples of examples, indicating the idiosyncrasies of how language models acquire information. We formulate example selection for in-context learning as a sequential decision problem, and propose a reinforcement learning algorithm for identifying generalizable policies to select demonstration examples. For GPT-2, our learned policies demonstrate strong abilities of generalizing to unseen tasks in training, with a $5.8\%$ improvement on average. Examples selected from our learned policies can even achieve a small improvement on GPT-3 Ada. However, the improvement diminishes on larger GPT-3 models, suggesting emerging capabilities of large language models.