论文标题
会议:带有卷积网络的实时零击语音风格转移
ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network
论文作者
论文摘要
我们提出了一个无需任何平行或转录数据的神经网络,用于零发音转换(VC)。我们的方法使用预先训练的模型来自动语音识别(ASR)和说话者嵌入,并从说话者验证任务中获得。我们的模型是完全卷积和非自动回归的模型,除了用于扬声器编码的小型预训练的复发网络。 Convoice可以转换任何长度的语音,而不会因其卷积架构而损害质量。我们的模型具有与类似最新模型的质量可比的,同时非常快。
We propose a neural network for zero-shot voice conversion (VC) without any parallel or transcribed data. Our approach uses pre-trained models for automatic speech recognition (ASR) and speaker embedding, obtained from a speaker verification task. Our model is fully convolutional and non-autoregressive except for a small pre-trained recurrent neural network for speaker encoding. ConVoice can convert speech of any length without compromising quality due to its convolutional architecture. Our model has comparable quality to similar state-of-the-art models while being extremely fast.