带有Wavenet Vocoder的非平行语音转换系统和崩溃的语音抑制

论文标题

带有Wavenet Vocoder的非平行语音转换系统和崩溃的语音抑制

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

论文作者

Wu, Yi-Chiao, Tobing, Patrick Lumban, Kobayashi, Kazuhiro, Hayashi, Tomoki, Toda, Tomoki

论文摘要

在本文中，我们将简单的非并行语音转换（VC）系统与WaveNet（WN）Vocoder和建议的倒塌语音抑制技术集成在一起。最近的作品已经确认了WN作为声音的有效性在声学特征的基础上产生高保真的语音波形。但是，当将WN Vocoder与VC系统相结合时，声学特征，声学和时间不匹配以及暴露偏见通常会导致言语质量降低，从而使WN产生一些非常嘈杂的语音段，称为崩溃的语音。为了解决该问题，我们将常规的Vocoder生成的语音作为参考语音，以得出线性预测性编码分布约束（LPCDC），以避免崩溃的语音问题。此外，为了减轻LPCDC引入的负面影响，我们提出了一个崩溃的语音段检测器（CSSD），以确保仅将LPCDC应用于有问题的片段，以将质量损失限制为短期。进行了客观和主观评估，实验结果证实了该方法的有效性，该方法进一步提高了我们先前提交到语音转换挑战2018的非平行VC系统的语音质量。

In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.

下载PDF全文

下载文献需遵守相关版权规定

论文标题