具有验证模型的自适应多语言语音识别

论文标题

具有验证模型的自适应多语言语音识别

Adaptive multilingual speech recognition with pretrained models

论文作者

Pham, Ngoc-Quan, Waibel, Alex, Niehues, Jan

论文摘要

正如最近的研究所反映的那样，具有监督学习的多语言语音识别取得了巨大的结果。随着音频和文本数据的训练方法的开发，必须将知识从无监督的多语言模型转移到促进识别中，尤其是在许多语言中，数据有限。我们的工作调查了使用两种验证模型的两种模式的有效性：wav2vec 2.0用于音频和文本MBART50，以及自适应重量技术，以大大提高包含Commumovice和Europarl的公共数据集的识别质量。总体而言，我们注意到纯粹的监督学习改善了44％，更重要的是，每种技术都提供了不同语言的不同加强。我们还探索了其他可能性，可以通过稍微增加对体系结构的深度或相对关注来获得最佳模型。

Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretrained models for two modalities: wav2vec 2.0 for audio and MBART50 for text, together with the adaptive weight techniques to massively improve the recognition quality on the public datasets containing CommonVoice and Europarl. Overall, we noticed an 44% improvement over purely supervised learning, and more importantly, each technique provides a different reinforcement in different languages. We also explore other possibilities to potentially obtain the best model by slightly adding either depth or relative attention to the architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题