语音合成和控制使用可区分的DSP

论文标题

语音合成和控制使用可区分的DSP

Speech Synthesis and Control Using Differentiable DSP

论文作者

Fabbro, Giorgio, Golkov, Vladimir, Kemp, Thomas, Cremers, Daniel

论文摘要

现代文本到语音系统能够产生自然和高质量的语音，但是语音包含变化的因素（例如，音调，节奏，响亮，音色）\，仅文本无法包含。在这项工作中，我们朝着语音综合系统迈进，该系统可以通过（但不需要）对各种变异因素的明确控制（但不需要）明确控制文本的各种语音演绎。我们提出了一种新的神经声码器，可以控制这种变异因素。这是通过使用可区分的数字信号处理（DDSP）（以前仅用于音乐而不是语音）来实现的，这暴露了这些变化的因素。结果表明，所提出的方法可以用逼真的音色产生自然语音，并且可以自由控制各种变异因素。

Modern text-to-speech systems are able to produce natural and high-quality speech, but speech contains factors of variation (e.g. pitch, rhythm, loudness, timbre)\ that text alone cannot contain. In this work we move towards a speech synthesis system that can produce diverse speech renditions of a text by allowing (but not requiring) explicit control over the various factors of variation. We propose a new neural vocoder that offers control of such factors of variation. This is achieved by employing differentiable digital signal processing (DDSP) (previously used only for music rather than speech), which exposes these factors of variation. The results show that the proposed approach can produce natural speech with realistic timbre, and individual factors of variation can be freely controlled.

下载PDF全文

下载文献需遵守相关版权规定

论文标题