基于变异自动编码器的普通话语音克隆

论文标题

基于变异自动编码器的普通话语音克隆

Variational Auto-Encoder based Mandarin Speech Cloning

论文作者

Xing, Qingyu, Ma, Xiaohan

论文摘要

由于机器学习的进步，语音克隆技术变得越来越复杂。研究人员成功地实施了一些有效模型的自然声音的英语语音综合和良好的英语语音克隆。但是，由于韵律措辞和大量的普通话，这些模型的利用尚未完成。通过创建一个新的数据集并用Vaenar-TTS代替Tacotron合成器，我们将现有的语音克隆技术CV2TTS提高到了几乎实时的语音克隆，同时保证合成质量。在此过程中，我们通过附加各种情况来自定义合成质量评估的主观测试，以便主题专注于语音和我们的改进之间的差异，也许对实际应用更有利。在自然性和相似性方面，A/B测试，实时因子（RTF）和2.74平均意见评分（MOS）的结果反映了我们实现的实时高质量的普通话语音。

Speech cloning technology is becoming more sophisticated thanks to the advances in machine learning. Researchers have successfully implemented natural-sounding English speech synthesis and good English speech cloning by some effective models. However, because of prosodic phrasing and large character set of Mandarin, Chinese utilization of these models is not yet complete. By creating a new dataset and replacing Tacotron synthesizer with VAENAR-TTS, we improved the existing speech cloning technique CV2TTS to almost real-time speech cloning while guaranteeing synthesis quality. In the process, we customized the subjective tests of synthesis quality assessment by attaching various scenarios, so that subjects focus on the differences between voice and our improvements maybe were more advantageous to practical applications. The results of the A/B test, real-time factor (RTF) and 2.74 mean opinion score (MOS) in terms of naturalness and similarity, reflect the real-time high-quality Mandarin speech cloning we achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题