论文标题
framewise wavegan:高速对抗性声码器,计算复杂性非常低的时间域
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
论文作者
论文摘要
Gan Vocoders目前是建立高质量神经波形生成模型的最新方法之一。但是,他们的大多数架构每秒都需要数十亿亿美元的浮点操作(GFLOPS)以样品方式生成语音波形。这使得Gan Vocoders在没有加速器或平行计算机的情况下仍在正常CPU上运行挑战。在这项工作中,我们为GAN Vocoders提出了一种新的体系结构,该架构主要取决于经常性和完全连接的网络,以直接以框架方式生成时域信号。这导致计算成本大大降低,并在GPU和低复杂性CPU上都可以快速生成。实验结果表明,我们的框架Wovegan Vocoder的质量明显高于自动回归最大样品型声码器,例如LPCNET,其复杂性非常低1.2 Gflops。这使Gan Vocoders在边缘和低功率设备上更加实用。
GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models. However, most of their architectures require dozens of billion floating-point operations per second (GFLOPS) to generate speech waveforms in samplewise manner. This makes GAN vocoders still challenging to run on normal CPUs without accelerators or parallel computers. In this work, we propose a new architecture for GAN vocoders that mainly depends on recurrent and fully-connected networks to directly generate the time domain signal in framewise manner. This results in considerable reduction of the computational cost and enables very fast generation on both GPUs and low-complexity CPUs. Experimental results show that our Framewise WaveGAN vocoder achieves significantly higher quality than auto-regressive maximum-likelihood vocoders such as LPCNet at a very low complexity of 1.2 GFLOPS. This makes GAN vocoders more practical on edge and low-power devices.