论文标题

无监督单位发现和语音综合的变压器VQ-VAE:Zerospeech 2020挑战

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

论文作者

Tjandra, Andros, Sakti, Sakriani, Nakamura, Satoshi

论文摘要

在本文中,我们报告了2019年赛道2020挑战的提交系统。挑战的主要主题是建立语音合成器,而无需任何文本信息或语音标签。为了应对这些挑战,我们构建了一个必须解决两个主要组成部分,例如1)给定语音音频,以无监督的方式提取子词单元,并且2)重新合成新颖的演讲者的音频。该系统还需要在ABX错误率和比特率压缩率之间平衡代码簿的性能。我们在这里的主要贡献是,我们提出了基于变压器的VQ-VAE,用于鉴于提取的代码簿,用于无监督的单元发现和基于变压器的逆变器。此外,我们还探索了几种正则化方法,以进一步提高性能。

In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019. The main theme in this challenge is to build a speech synthesizer without any textual information or phonetic labels. In order to tackle those challenges, we build a system that must address two major components such as 1) given speech audio, extract subword units in an unsupervised way and 2) re-synthesize the audio from novel speakers. The system also needs to balance the codebook performance between the ABX error rate and the bitrate compression rate. Our main contribution here is we proposed Transformer-based VQ-VAE for unsupervised unit discovery and Transformer-based inverter for the speech synthesis given the extracted codebook. Additionally, we also explored several regularization methods to improve performance even further.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源