培训对话框模型的数据增强对语音识别错误的强大

论文标题

培训对话框模型的数据增强对语音识别错误的强大

Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors

论文作者

Wang, Longshaokan, Fazel-Zarandi, Maryam, Tiwari, Aditya, Matsoukas, Spyros, Polymenakos, Lazaros

论文摘要

基于语音的虚拟助手（例如Amazon Alexa，Google Assistant和Apple Siri）通常通过自动语音识别（ASR）将用户的音频信号转换为文本数据，并将文本馈送到下游对话框模型，以了解自然语言理解和响应。 ASR输出容易出错；但是，下游对话框模型通常经过无错误的文本数据训练，使其在推理时间内对ASR错误敏感。为了弥合差距并使对话模型更加强大，我们利用ASR错误模拟器将噪声注入无错误的文本数据，然后使用增强数据训练对话框模型。与处理ASR错误的其他方法（例如使用ASR晶格或端到端方法）相比，我们的数据增强方法不需要对ASR或下游对话框模型进行任何修改；我们的方法在推理期间也不会引入任何额外的延迟。我们在基准数据上进行了广泛的实验，并表明我们的方法在存在ASR错误的情况下提高了下游对话模型的性能，并且在模型大小限制或培训数据稀缺的低资源情况下，它在低资源情况下特别有效。

Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR) and feed the text to downstream dialog models for natural language understanding and response generation. The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time. To bridge the gap and make dialog models more robust to ASR errors, we leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data. Compared to other approaches for handling ASR errors, such as using ASR lattice or end-to-end methods, our data augmentation approach does not require any modification to the ASR or downstream dialog models; our approach also does not introduce any additional latency during inference time. We perform extensive experiments on benchmark data and show that our approach improves the performance of downstream dialog models in the presence of ASR errors, and it is particularly effective in the low-resource situations where there are constraints on model size or the training data is scarce.

下载PDF全文

下载文献需遵守相关版权规定

论文标题