在现实世界应用中，多阶段的跨语性声学模型适应可靠的语音识别 - 关于德国口述历史访谈的案例研究

论文标题

在现实世界应用中，多阶段的跨语性声学模型适应可靠的语音识别 - 关于德国口述历史访谈的案例研究

Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications -- A Case Study on German Oral History Interviews

论文作者

Gref, Michael, Walter, Oliver, Schmidt, Christoph, Behnke, Sven, Köhler, Joachim

论文摘要

虽然最近的自动语音识别系统在大量足够的，高质量的注释语音数据中实现出色的性能，但相同的系统通常只能为域中的任务带来不令人满意的结果，这些任务与培训数据所代表的条件大大偏离。对于许多实际应用程序，缺乏足够的数据，可以直接用于培训强大的语音识别系统。为了解决这个问题，我们提出并研究一种方法，该方法以跨语性的，多阶段的方式对目标域进行了强大的声学模型适应。我们的方法可以利用来自其他语言和其他语言的其他领域的大规模培训数据的利用。我们使用德国口述历史访谈的具有挑战性的任务来评估我们的方法，与仅在目标域上从头开始训练的模型相比，我们将单词错误率的相对降低超过30％，而相对于1000小时的同一语言培训的模型相比，相对于6-7％。

While recent automatic speech recognition systems achieve remarkable performance when large amounts of adequate, high quality annotated speech data is used for training, the same systems often only achieve an unsatisfactory result for tasks in domains that greatly deviate from the conditions represented by the training data. For many real-world applications, there is a lack of sufficient data that can be directly used for training robust speech recognition systems. To address this issue, we propose and investigate an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner. Our approach enables the exploitation of large-scale training data from other domains in both the same and other languages. We evaluate our approach using the challenging task of German oral history interviews, where we achieve a relative reduction of the word error rate by more than 30% compared to a model trained from scratch only on the target domain, and 6-7% relative compared to a model trained robustly on 1000 hours of same-language out-of-domain training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题