具有可控工作记忆的大型语言模型

论文标题

具有可控工作记忆的大型语言模型

Large Language Models with Controllable Working Memory

论文作者

Li, Daliang, Rawat, Ankit Singh, Zaheer, Manzil, Wang, Xin, Lukasik, Michal, Veit, Andreas, Yu, Felix, Kumar, Sanjiv

论文摘要

大型语言模型（LLM）导致了自然语言处理（NLP）的一系列突破，这是由于它们的出色理解和发电能力。值得注意的是，进一步设定这些模型的是它们在训练期间内部化的大量世界知识。尽管许多下游应用程序为模型提供了一个信息上下文，以帮助其在基本任务上的绩效，但该模型的世界知识如何与上下文中介绍的事实信息相互作用。作为理想的行为，LLM应在包含与模型记忆的知识相冲突的任务信息时，应优先考虑上下文。这使模型预测能够在上下文中扎根，然后可以将其用于更新或纠正特定模型预测而无需经常进行重新训练。相比之下，当上下文与任务无关时，该模型应忽略它并依靠其内部知识。在本文中，我们对LLM的背景下的上述两种特性进行了首次联合研究，即可控性和鲁棒性。我们证明，最先进的T5和Palm（均经审计和填充）可能表现出较差的可控性和鲁棒性，它们不会随着模型尺寸的增加而扩展。作为解决方案，我们提出了一种新颖的方法 - 知识意识到的Finetuning（Kaft） - 通过将反事实和不相关的环境与标准监督数据集融合在一起，以增强可控性和鲁棒性。我们的全面评估展示了跨模型体系结构和大小的KAFT的实用性。

Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performance on the underlying task, how the model's world knowledge interacts with the factual information presented in the context remains under explored. As a desirable behavior, an LLM should give precedence to the context whenever it contains task-relevant information that conflicts with the model's memorized knowledge. This enables model predictions to be grounded in the context, which can then be used to update or correct specific model predictions without frequent retraining. By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge. In this paper, we undertake a first joint study of the aforementioned two properties, namely controllability and robustness, in the context of LLMs. We demonstrate that state-of-the-art T5 and PaLM (both pretrained and finetuned) could exhibit poor controllability and robustness, which do not scale with increasing model size. As a solution, we propose a novel method - Knowledge Aware FineTuning (KAFT) - to strengthen both controllability and robustness by incorporating counterfactual and irrelevant contexts to standard supervised datasets. Our comprehensive evaluation showcases the utility of KAFT across model architectures and sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题