使用环状噪声作为基于神经源滤波器的语音波形模型的源信号

论文标题

使用环状噪声作为基于神经源滤波器的语音波形模型的源信号

Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

论文作者

Wang, Xin, Yamagishi, Junichi

论文摘要

神经源过滤器（NSF）波形模型通过在时域中通过扩张的卷积变形，从而产生语音波形。尽管基于正弦的源信号可帮助NSF模型用指定的音调产生声音，但正弦形状可能会在目标声音较少周期性时限制生成的波形。在本文中，我们提出了一个更灵活的源信号，称为循环噪声，这是一个由脉冲序列的卷积给出的准周期性噪声序列，以及具有控制信号形状的可训练衰减速率的静态随机噪声。我们进一步提出了掩盖的光谱损失，以指导NSF模型从基于环状噪声的源信号中产生周期性的声音。大规模听力测试的结果证明了循环噪声的有效性以及在CMU北极数据库的拷贝性合成实验中，对扬声器独立的NSF模型的掩盖光谱损失的有效性。

Neural source-filter (NSF) waveform models generate speech waveforms by morphing sine-based source signals through dilated convolution in the time domain. Although the sine-based source signals help the NSF models to produce voiced sounds with specified pitch, the sine shape may constrain the generated waveform when the target voiced sounds are less periodic. In this paper, we propose a more flexible source signal called cyclic noise, a quasi-periodic noise sequence given by the convolution of a pulse train and a static random noise with a trainable decaying rate that controls the signal shape. We further propose a masked spectral loss to guide the NSF models to produce periodic voiced sounds from the cyclic noise-based source signal. Results from a large-scale listening test demonstrated the effectiveness of the cyclic noise and the masked spectral loss on speaker-independent NSF models in copy-synthesis experiments on the CMU ARCTIC database.

下载PDF全文

下载文献需遵守相关版权规定

论文标题