论文标题
属级:通过考虑培训推断,改善Vocoder的扩散模型
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training
论文作者
论文摘要
在推理中进行大量迭代,以实现与最新生成模型相匹配或超过最新的生成模型的发电质量,这总是会导致推理速度缓慢。先前的方法旨在优化一些迭代次数的推理时间表的选择,以加快推理的速度。但是,这会导致发电质量降低,这主要是因为推理过程是分开优化的,而无需通过训练过程进行共同优化。在本文中,我们提出了Surpgrad,这是Vocoder的扩散模型,该模型将推理过程纳入训练中,以减少推理迭代,同时保持高发电质量。更具体地说,在训练过程中,我们通过在推理时间表下的反向过程中生成数据,并进行了一些迭代,并施加损失,以最大程度地减少生成的和地面数据样本之间的差距。然后,与现有方法不同,卧铺的培训考虑了推理过程。通过在LJSpeech数据集上的实验证明,卧育的优势证明了,在相同条件下,属性的语音质量更好,同时保持与基线相同的语音质量,但具有$ 3 $ X的速度($ 2 $ thefly to for thriscgrad vs $ 6 $ to Wavegrad的$ 6 $迭代)。
Denoising diffusion probabilistic models (diffusion models for short) require a large number of iterations in inference to achieve the generation quality that matches or surpasses the state-of-the-art generative models, which invariably results in slow inference speed. Previous approaches aim to optimize the choice of inference schedule over a few iterations to speed up inference. However, this results in reduced generation quality, mainly because the inference process is optimized separately, without jointly optimizing with the training process. In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality. More specifically, during training, we generate data from random noise through a reverse process under inference schedules with a few iterations, and impose a loss to minimize the gap between the generated and ground-truth data samples. Then, unlike existing approaches, the training of InferGrad considers the inference process. The advantages of InferGrad are demonstrated through experiments on the LJSpeech dataset showing that InferGrad achieves better voice quality than the baseline WaveGrad under same conditions while maintaining the same voice quality as the baseline but with $3$x speedup ($2$ iterations for InferGrad vs $6$ iterations for WaveGrad).