有效培训语言模型以填补中间

论文标题

有效培训语言模型以填补中间

Efficient Training of Language Models to Fill in the Middle

论文作者

Bavarian, Mohammad, Jun, Heewoo, Tezak, Nikolas, Schulman, John, McLeavey, Christine, Tworek, Jerry, Chen, Mark

论文摘要

我们表明，在将直接转换应用到数据集之后，自回归语言模型可以学会填充文本，该数据集将文本从文档的中间移动到其末尾。尽管近年来，这种数据增强引起了人们的极大兴趣，但我们提供了广泛的证据，表明以这种方式转换的数据很大一部分并不会损害原始的左右生成能力，这是通过困惑性和在广泛范围内进行抽样评估来衡量的。鉴于培训模型对中间的有用性，简单性和效率（FIM），我们建议将未来的自回归语言模型默认使用FIM培训。为此，我们在关键的超参数上运行一系列消融，例如数据转换频率，转换的结构以及选择填充跨度的方法。我们使用这些消融来规定强大的默认设置和最佳实践来训练FIM模型。我们发布了最佳的填充模型，该模型在API中培训了最佳实践，并发布了我们的填充基准以帮助未来的研究。

We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题