使用DNN-HMM框架的多语言方法进行联合语音和重音识别

论文标题

使用DNN-HMM框架的多语言方法进行联合语音和重音识别

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework

论文作者

Peng, Yizhou, Zhang, Jicheng, Zhang, Haobo, Xu, Haihua, Huang, Hao, Chng, Eng Siong

论文摘要

人类可以同时认识到言语，以及演讲的特殊口音。但是，当前的最新ASR系统很少做到这一点。在本文中，我们提出了一种多种语言的方法来识别英语语音，以及使用DNN-HMM框架传达的说话者的相关口音。具体而言，我们假设英语的不同口音是不同的语言。然后，我们将它们合并在一起，并训练多语言ASR系统。在解码过程中，我们进行了两个实验。一个是基于ASR的单语解码，其重音信息嵌入了电话级别，实现了基于单词的重音识别（AR），另一个是基于多语言ASR的解码，实现了基于近似话语的AR。在8个重点的英语语音识别上的实验结果表明，两种方法都可以使WERS接近传统的ASR系统，这些ASR系统完全忽略了口音以及所需的AR准确性。此外，我们对所提出的方法进行了广泛的分析，例如不域数据开发，跨活化识别困惑以及重音字的特征。

Human can recognize speech, as well as the peculiar accent of the speech simultaneously. However, present state-of-the-art ASR system can rarely do that. In this paper, we propose a multilingual approach to recognizing English speech, and related accent that speaker conveys using DNN-HMM framework. Specifically, we assume different accents of English as different languages. We then merge them together and train a multilingual ASR system. During decoding, we conduct two experiments. One is a monolingual ASR-based decoding, with the accent information embedded at phone level, realizing word-based accent recognition (AR), and the other is a multilingual ASR-based decoding, realizing an approximated utterance-based AR. Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as well as desired AR accuracy. Besides, we conduct extensive analysis for the proposed method, such as transfer learning without-domain data exploitation, cross-accent recognition confusion, as well as characteristics of accented-word.

下载PDF全文

下载文献需遵守相关版权规定

论文标题