站在巨型冷冻语言模型的肩膀上

论文标题

站在巨型冷冻语言模型的肩膀上

Standing on the Shoulders of Giant Frozen Language Models

论文作者

Levine, Yoav, Dalmedigos, Itay, Ram, Ori, Zeldes, Yoel, Jannai, Daniel, Muhlgay, Dor, Osin, Yoni, Lieber, Opher, Lenz, Barak, Shalev-Shwartz, Shai, Shashua, Amnon, Leyton-Brown, Kevin, Shoham, Yoav

论文摘要

庞大的语言模型（LMS）在各种各样的任务上表现出了令人惊讶的零拍功能。这引起了单个多功能模型的吸引人的愿景，该模型在不同的应用程序中具有广泛的功能。但是，当前利用“冷冻” LM的领先技术 - 即，其权重未被触及 - 仍然经常以表现不佳的微调方法，这些方法以任务依赖的方式修改这些权重。这些反过来又遭受了健忘和妥协的多功能性，表明性能和多功能性之间的权衡。本文的主要信息是，当前的冷冻模型技术（例如及时调整）只是冰山一角，而利用冷冻LM的更强大的方法也可以在不牺牲基础模型的多功能性的情况下进行挑战性的域中进行微调。为了证明这一点，我们介绍了三种利用冷冻模型的新方法：输入依赖性及时调整，冷冻读取器和递归LMS，每种LMS都大大改善了当前的冷冻模型方法。确实，我们的某些方法甚至超过了当前由后者主导的域中的微调方法。每种方法的计算成本高于现有的冷冻模型方法的计算成本，但相对于通过巨大的冷冻LM而言，相对于单个传递仍然可以忽略不计。这些方法中的每一个本身都构成了有意义的贡献，但是通过将这些贡献呈现在一起，我们旨在说服读者了解一个更广泛的信息，超出了任何给定方法的细节：冷冻模型具有未开发的潜力，并且通常不必要进行微调。

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题