关于为什么语言模型有助于解决下游任务的数学探索

论文标题

关于为什么语言模型有助于解决下游任务的数学探索

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

论文作者

Saunshi, Nikunj, Malladi, Sadhika, Arora, Sanjeev

论文摘要

自回归语言模型使用大型文本语料库进行了预审到下一个单词预测的良好状态，即使使用零拍摄的使用，也成功地解决了许多下游任务。但是，对这一成功的理论了解很少。本文通过考虑以下问题来启动对文本分类下游任务的这种现象的数学研究：（1）下一个单词预测和文本分类的预处理任务之间的直观联系是什么？（2）我们如何在数学上形式化这种联系并量化语言建模的好处？对于（1），我们假设并从经验上验证，可以将感兴趣的分类任务重新归类为句子完成任务，从而使语言建模是有意义的预处理任务。通过对该假设的数学形式化，我们取得了进步（2）的进步，并表明$ε$ - 在跨渗透性（log-perperplexity）中的语言模型学习可以线性地求解此类分类任务的功能，以$ \ nathcal {o}（o}（o}）（\sqrtε）$错误，因此可以在语言上表现出良好的模型。我们通过实验验证各种假设和理论发现，还使用分析的见解来设计一个在某些分类任务上表现良好的新目标函数。

Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of this success. This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification? (2) How can we mathematically formalize this connection and quantify the benefit of language modeling? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as sentence completion tasks, thus making language modeling a meaningful pretraining task. With a mathematical formalization of this hypothesis, we make progress towards (2) and show that language models that are $ε$-optimal in cross-entropy (log-perplexity) learn features that can linearly solve such classification tasks with $\mathcal{O}(\sqrtε)$ error, thus demonstrating that doing well on language modeling can be beneficial for downstream tasks. We experimentally verify various assumptions and theoretical findings, and also use insights from the analysis to design a new objective function that performs well on some classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题