论文标题
克服语言建模中技能注入的障碍:算术中的案例研究
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
论文作者
论文摘要
通过他们的转移学习能力,高度参数化的大型预训练的语言模型已主导着许多下游语言任务的NLP格局。尽管在语言上熟练,但这些模型无法整合非语言实体的学习(数字和算术推理)限制了其用于需要数字理解或严格数学推理的任务。但是,正如我们在本文中所说明的那样,构建一个通用语言模型也恰好精通数学推理并不像在数字数据集中训练它那样直接。在这项工作中,我们开发了一个新颖的框架,该框架使语言模型在保持语言能力的同时可以在数学上精通。具体来说,我们提供信息理论干预措施,以克服对语言技能的灾难性忘记,同时将非语言技能注入语言模型。
Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.