会议：带有卷积网络的实时零击语音风格转移

论文标题

会议：带有卷积网络的实时零击语音风格转移

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

论文作者

Rebryk, Yurii, Beliaev, Stanislav

论文摘要

我们提出了一个无需任何平行或转录数据的神经网络，用于零发音转换（VC）。我们的方法使用预先训练的模型来自动语音识别（ASR）和说话者嵌入，并从说话者验证任务中获得。我们的模型是完全卷积和非自动回归的模型，除了用于扬声器编码的小型预训练的复发网络。 Convoice可以转换任何长度的语音，而不会因其卷积架构而损害质量。我们的模型具有与类似最新模型的质量可比的，同时非常快。

We propose a neural network for zero-shot voice conversion (VC) without any parallel or transcribed data. Our approach uses pre-trained models for automatic speech recognition (ASR) and speaker embedding, obtained from a speaker verification task. Our model is fully convolutional and non-autoregressive except for a small pre-trained recurrent neural network for speaker encoding. ConVoice can convert speech of any length without compromising quality due to its convolutional architecture. Our model has comparable quality to similar state-of-the-art models while being extremely fast.

下载PDF全文

下载文献需遵守相关版权规定

论文标题