调查跨语性声学相似性对多语言语音识别的影响

论文标题

调查跨语性声学相似性对多语言语音识别的影响

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

论文作者

Farooq, Muhammad Umar, Hain, Thomas

论文摘要

多语言自动语音识别（ASR）系统大多受益于低资源语言，但相对于单语言对应物，多种语言的性能下降。有限的研究集中在理解多语言语音识别设置中的语言行为。在本文中，提出了一种新型的数据驱动方法来研究跨语性的声学表达相似性。该技术衡量了针对目标语音信号的各种单语言模型的后验分布之间的相似性。深度神经网络被训练为映射网络，以将分布从不同的声学模型转换为直接比较的形式。分析观察到，与重叠音素集的音量的数量无法真正估计的语言。对拟议的映射网络的熵分析表明，具有较小重叠的语言可以更适合跨语性转移，因此在多语言设置中更有益。最后，提出的后验变换方法被利用为目标语言的单语模型融合。比单语言的相对提高约为8％。

Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but suffer degradation in performance across several languages relative to their monolingual counterparts. Limited studies have focused on understanding the languages behaviour in the multilingual speech recognition setups. In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal. Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form. The analysis observes that the languages closeness can not be truly estimated by the volume of overlapping phonemes set. Entropy analysis of the proposed mapping networks exhibits that a language with lesser overlap can be more amenable to cross-lingual transfer, and hence more beneficial in the multilingual setup. Finally, the proposed posterior transformation approach is leveraged to fuse monolingual models for a target language. A relative improvement of ~8% over monolingual counterpart is achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题