失败的尽头：调查语音识别错误对随后的痴呆分类的影响

论文标题

失败的尽头：调查语音识别错误对随后的痴呆分类的影响

The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification

论文作者

Li, Changye, Cohen, Trevor, Pakhomov, Serguei

论文摘要

在自发语音中可检测到的语言异常已经显示出对各种临床应用的希望，包括筛查痴呆症和其他形式的认知障碍。在大规模临床环境中从语音获得的语言样本进行分类的自动工具的可行性取决于捕获和自动转录语音以进行后续分析的能力。但是，在临床环境中挑战性的语音样本中，具有策划语音数据的自我监督学习（SSL）自动语音识别（ASR）模型的令人印象深刻的表现并不明显。成功将ASR模型应用于临床应用的关键问题之一是，它们生成的不完善的成绩单是否为下游任务提供了足够的信息，可以在可接受的准确性水平上运行。在这项研究中，我们研究了几个深度学习ASR系统所产生的错误与它们对痴呆分类下游任务的影响之间的关系。我们的主要发现之一是，矛盾的是，错误率相对较高的ASR系统可以产生转录本，从而比基于逐字记录的分类产生更好的下游分类精度。

Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment. The feasibility of deploying automated tools that can classify language samples obtained from speech in large-scale clinical settings depends on the ability to capture and automatically transcribe the speech for subsequent analysis. However, the impressive performance of self-supervised learning (SSL) automatic speech recognition (ASR) models with curated speech data is not apparent with challenging speech samples from clinical settings. One of the key questions for successfully applying ASR models for clinical applications is whether imperfect transcripts they generate provide sufficient information for downstream tasks to operate at an acceptable level of accuracy. In this study, we examine the relationship between the errors produced by several deep learning ASR systems and their impact on the downstream task of dementia classification. One of our key findings is that, paradoxically, ASR systems with relatively high error rates can produce transcripts that result in better downstream classification accuracy than classification based on verbatim transcripts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题