论文标题

准周期性平行波gan vocoder:参数语音产生的非自动回旋倾斜扩张卷积模型

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation

论文作者

Wu, Yi-Chiao, Hayashi, Tomoki, Okamoto, Takuma, Kawai, Hisashi, Toda, Tomoki

论文摘要

在本文中,我们提出了一个平行的Wavegan(PWG)类似神经辅助仪,具有准周期性(QP)结构,以提高PWG的音高可控性。 PWG是一种紧凑的非自动回旋(非AR)语音生成模型,其生成速度比实时快得多。在利用PWG作为声音编码器基于声学特征(例如光谱和韵律特征)来产生语音时,PWG会产生高保真的语音。但是,当输入声学功能包括看不见的音调时,由于PWG的固定和通用网络而没有语音周期性的固定和通用网络,因此PWG生成的语音降解的音高准确性。拟议的QPPWG采用了一个依赖俯仰的扩张卷积网络(PDCNN)模块,该模块通过动态变化的网络体系结构将音调信息引入PWG,以提高Vanilla PWG的音高可控性和语音建模能力。客观和主观评估结果均表明,当QPPWG模型大小仅为香草PWG的70%时,QPPWG生成的语音的音高准确性和可比的语音质量。

In this paper, we propose a parallel WaveGAN (PWG)-like neural vocoder with a quasi-periodic (QP) architecture to improve the pitch controllability of PWG. PWG is a compact non-autoregressive (non-AR) speech generation model, whose generative speed is much faster than real time. While utilizing PWG as a vocoder to generate speech on the basis of acoustic features such as spectral and prosodic features, PWG generates high-fidelity speech. However, when the input acoustic features include unseen pitches, the pitch accuracy of PWG-generated speech degrades because of the fixed and generic network of PWG without prior knowledge of speech periodicity. The proposed QPPWG adopts a pitch-dependent dilated convolution network (PDCNN) module, which introduces the pitch information into PWG via the dynamically changed network architecture, to improve the pitch controllability and speech modeling capability of vanilla PWG. Both objective and subjective evaluation results show the higher pitch accuracy and comparable speech quality of QPPWG-generated speech when the QPPWG model size is only 70 % of that of vanilla PWG.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源