论文标题
通过多语言建模的低资源语言的有效神经语音综合
Efficient neural speech synthesis for low-resource languages through multilingual modeling
论文作者
论文摘要
神经TT的最新进展导致了可以产生高质量合成语音的模型。但是,这些模型通常需要大量的培训数据,这可能会使以所需质量产生新的声音成本。尽管多扬声器建模可以减少新声音所需的数据要求,但对于许多低资源语言,这种方法通常是不可行的,而这些语言的多种多样的多语言是不可用的。因此,在本文中,我们调查了多语言多说话的建模可以替代单语多演讲者建模,并探讨了如何最好地将来自外语的数据与低资源语言数据结合在一起。我们发现,多语言建模可以提高低资源语言语音的自然性,表明多语言模型可以产生具有与单语多演讲者模型相当的自然性的语音,并看到目标语言自然性受到添加外语数据的策略的影响。
Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languages for which abundant multi-speaker data is not available. In this paper, we therefore investigated to what extent multilingual multi-speaker modeling can be an alternative to monolingual multi-speaker modeling, and explored how data from foreign languages may best be combined with low-resource language data. We found that multilingual modeling can increase the naturalness of low-resource language speech, showed that multilingual models can produce speech with a naturalness comparable to monolingual multi-speaker models, and saw that the target language naturalness was affected by the strategy used to add foreign language data.