论文标题
关于训练语料库对通过大规模语言模型学习的影响
On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model
论文作者
论文摘要
许多关于大规模语言模型的最新研究报告了成功的零态和少数学习能力。但是,仍然缺乏对何时进行内在学习的深入分析。例如,尚不清楚随着培训语料库的变化,在文本学习绩效如何变化。在这里,我们研究了预处理语料库对以韩国为中心的GPT-3模型HyperClova中文化学习的源头和大小的影响。从我们的深度调查中,我们介绍以下观察结果:(1)在很大程度上取决于语料库领域的来源,并且预处理语料库的大小不一定确定在文本中学习的出现,(2)当在每种情况下,即使在多个公司的组合中培训了每种语言模型,也不会出现在本文中,即使是在多个公司的组合中,也可以(即使在多个公司的组合中都可以培训)(借助与下游任务相关的语料库,并不能总是保证下游任务的竞争性内在学习表现,尤其是在几个弹射设置中,以及(4)语言建模(以困惑性)和内在的内在学习之间的关系并不总是相关:
Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability. However, the in-depth analysis of when in-context learning occurs is still lacking. For example, it is unknown how in-context learning performance changes as the training corpus varies. Here, we investigate the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model. From our in-depth investigation, we introduce the following observations: (1) in-context learning performance heavily depends on the corpus domain source, and the size of the pretraining corpus does not necessarily determine the emergence of in-context learning, (2) in-context learning ability can emerge when a language model is trained on a combination of multiple corpora, even when each corpus does not result in in-context learning on its own, (3) pretraining with a corpus related to a downstream task does not always guarantee the competitive in-context learning performance of the downstream task, especially in the few-shot setting, and (4) the relationship between language modeling (measured in perplexity) and in-context learning does not always correlate: e.g., low perplexity does not always imply high in-context few-shot learning performance.