论文标题

使用伪暹罗解散网络的零射音重音转换

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

论文作者

Jia, Dongya, Tian, Qiao, Peng, Kainan, Li, Jiaxin, Chen, Yuanzhe, Ma, Mingbo, Wang, Yuping, Wang, Yuxuan

论文摘要

口音转换(AC)的目的是将语音口音转换为目标口音,同时保留内容和说话者的身份。 AC启用各种应用程序,例如语言学习,语音内容创建和数据增强。以前的方法依赖于推理阶段的参考话语或无法保留说话者身份。为了解决这些问题,我们提出了一种无参考的重音转换方法,该方法能够将看不见的说话者的话语转换为目标口音。提出了伪暹罗解散网络(PSDN),以使重音与内容表示。实验结果表明,我们的模型生成的语音样本比输入和可比自然性高得多,在双向转换上,包括外国到本地和外国。

The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we propose a zero-shot reference-free accent conversion method, which is able to convert unseen speakers' utterances into a target accent. Pseudo Siamese Disentanglement Network (PSDN) is proposed to disentangle the accent from the content representation. Experimental results show that our model generates speech samples with much higher accentedness than the input and comparable naturalness, on two-way conversion including foreign-to-native and native-to-foreign.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源