神经语言模型的诞生不是相等的脑数据，但是培训有助于

论文标题

神经语言模型的诞生不是相等的脑数据，但是培训有助于

Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

论文作者

Pasquiou, Alexandre, Lakretz, Yair, Hale, John, Thirion, Bertrand, Pallier, Christophe

论文摘要

在过去的几年中，神经语言模型（NLM）取得了巨大进步，在各种语言任务上取得了令人印象深刻的表现。利用这一点，神经科学的研究已开始使用NLMS在语言处理过程中研究人脑中的神经活动。但是，关于哪些因素决定了神经语言模型捕获大脑活动的能力（又称其“大脑评分”）的能力，许多问题仍未得到解决。在这里，我们朝这个方向迈出了第一步，并研究了测试丢失，训练语料库和模型架构（比较手套，LSTM，GPT-2和BERT）对功能性磁共振成像的预测参与者的功能性磁共振成像的预测。我们发现（1）每个模型的未经训练的版本已经通过捕获相同单词的大脑响应的相似性来解释大脑中的大量信号，而未经训练的LSTM胜过基于变压器的模型，而受到上下文效果的影响较小。（2）训练NLP模型可改善同一大脑区域的大脑评分，而与模型的结构无关；（3）困惑（测试损失）不是大脑评分的良好预测指标；（4）训练数据对结果有很大的影响，尤其是，现成的模型可能缺乏检测大脑激活的统计能力。总体而言，我们概述了模型训练选择的影响，并为未来的研究提出了良好的实践，旨在使用神经语言模型来解释人类语言系统。

Neural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unanswered regarding which factors determine the ability of a neural language model to capture brain activity (aka its 'brain score'). Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that (1) untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words, with the untrained LSTM outperforming the transformerbased models, being less impacted by the effect of context; (2) that training NLP models improves brain scores in the same brain regions irrespective of the model's architecture; (3) that Perplexity (test loss) is not a good predictor of brain score; (4) that training data have a strong influence on the outcome and, notably, that off-the-shelf models may lack statistical power to detect brain activations. Overall, we outline the impact of modeltraining choices, and suggest good practices for future studies aiming at explaining the human language system using neural language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题