论文标题
使用伪暹罗解散网络的零射音重音转换
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
论文作者
论文摘要
口音转换(AC)的目的是将语音口音转换为目标口音,同时保留内容和说话者的身份。 AC启用各种应用程序,例如语言学习,语音内容创建和数据增强。以前的方法依赖于推理阶段的参考话语或无法保留说话者身份。为了解决这些问题,我们提出了一种无参考的重音转换方法,该方法能够将看不见的说话者的话语转换为目标口音。提出了伪暹罗解散网络(PSDN),以使重音与内容表示。实验结果表明,我们的模型生成的语音样本比输入和可比自然性高得多,在双向转换上,包括外国到本地和外国。
The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we propose a zero-shot reference-free accent conversion method, which is able to convert unseen speakers' utterances into a target accent. Pseudo Siamese Disentanglement Network (PSDN) is proposed to disentangle the accent from the content representation. Experimental results show that our model generates speech samples with much higher accentedness than the input and comparable naturalness, on two-way conversion including foreign-to-native and native-to-foreign.