两个困惑的故事：神经语言模型对阿尔茨海默氏症痴呆症词汇检索缺陷的敏感性

论文标题

两个困惑的故事：神经语言模型对阿尔茨海默氏症痴呆症词汇检索缺陷的敏感性

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type

论文作者

Cohen, Trevor, Pakhomov, Serguei

论文摘要

近年来，人们对使用计算方法的使用引起了人们的兴趣，以区分痴呆症患者产生的引起的语音样本和与健康对照组相比。两种神经语言模型（LMS）的困惑性估计之间的差异 - 一种接受了由健康参与者产生的言语的培训，另一种对痴呆症患者的转录本培训的培训 - 作为诊断不见成绩记录的诊断分类的单一特征，可表现出可产生前所未有的表现。但是，对于为什么这种方法有效，并且由于最广泛使用的成绩单评估集（Dementibarank）缺乏案例/控制匹配，尚不清楚这些方法是否真正诊断，或者对其他变量敏感。在本文中，我们使用以前开发的合成叙事来询问对患有和不痴呆的参与者训练的神经LM，以模拟通过操纵词汇频率来模拟渐进语义痴呆。我们发现，神经LMS的困惑与词汇频率密切相关，并且由插值控制和痴呆LMS产生的混合模型在当前的最新图表上改善了对转录本文本训练的模型。

In recent years there has been a burgeoning interest in the use of computational methods to distinguish between elicited speech samples produced by patients with dementia, and those from healthy controls. The difference between perplexity estimates from two neural language models (LMs) - one trained on transcripts of speech produced by healthy participants and the other trained on transcripts from patients with dementia - as a single feature for diagnostic classification of unseen transcripts has been shown to produce state-of-the-art performance. However, little is known about why this approach is effective, and on account of the lack of case/control matching in the most widely-used evaluation set of transcripts (DementiaBank), it is unclear if these approaches are truly diagnostic, or are sensitive to other variables. In this paper, we interrogate neural LMs trained on participants with and without dementia using synthetic narratives previously developed to simulate progressive semantic dementia by manipulating lexical frequency. We find that perplexity of neural LMs is strongly and differentially associated with lexical frequency, and that a mixture model resulting from interpolating control and dementia LMs improves upon the current state-of-the-art for models trained on transcript text exclusively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题