改进的矢量量化扩散模型

论文标题

改进的矢量量化扩散模型

Improved Vector Quantized Diffusion Models

论文作者

Tang, Zhicong, Gu, Shuyang, Bao, Jianmin, Chen, Dong, Wen, Fang

论文摘要

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input.我们发现这些问题主要是由于有缺陷的抽样策略。在本文中，我们提出了两种重要技术，以进一步提高VQ扩散的样本质量。 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2）我们提出了一项高质量的推理策略，以减轻VQ-Diffusion中的联合分配问题。 Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins.我们在MSCOCO上取得了8.44 FID得分，超过了VQ扩散5.42 FID得分。在Imagenet进行培训时，我们将FID得分从11.89提高到4.83，证明了我们提出的技术的优势。

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2) We present a high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins. We achieve an 8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83, demonstrating the superiority of our proposed techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题