ASR错误检测通过音频转录

论文标题

ASR错误检测通过音频转录

ASR Error Detection via Audio-Transcript entailment

论文作者

Meripo, Nimshi Venkat, Konam, Sandeep

论文摘要

尽管最新的自动语音识别（ASR）系统的性能得到改善，但转录错误仍然不可避免。当用于帮助临床文档时，这些错误可能会对医疗保健等关键领域产生相当大的影响。因此，检测ASR错误是防止进一步传播下游应用程序的关键第一步。为此，我们提出了一种使用Audio-Transcript Indailment的新颖的端到端方法，用于ASR错误检测。据我们所知，我们是第一个将此问题作为音频段及其相应的成绩单段之间的端到端需要任务的人。我们的直觉是，当没有识别错误时，音频和笔录之间应该有双向构成，反之亦然。提出的模型利用声学编码器和语言编码器分别对语音和转录本进行建模。两种模式的编码表示形式都融合在一起以预测元素。由于我们的实验中使用了医生对话，因此特别强调了医学术语。我们提出的模型在所有转录误差上的分类错误率（CER）为26.2％，特别是医疗错误的23％，导致强大基线的改善分别提高了12％和15.4％。

Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively. The encoded representations of both modalities are fused to predict the entailment. Since doctor-patient conversations are used in our experiments, a particular emphasis is placed on medical terms. Our proposed model achieves classification error rates (CER) of 26.2% on all transcription errors and 23% on medical errors specifically, leading to improvements upon a strong baseline by 12% and 15.4%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题