使用强大的分类器指导增强基于扩散的图像合成

论文标题

使用强大的分类器指导增强基于扩散的图像合成

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

论文作者

Kawar, Bahjat, Ganz, Roy, Elad, Michael

论文摘要

denoising扩散概率模型（DDPM）是最近获得最新结果的生成模型系列。为了获得类条件生成，建议通过从时间依赖性分类器中梯度指导扩散过程。尽管这个想法在理论上是合理的，但基于深度学习的分类器臭名昭著地容易受到基于梯度的对抗攻击的影响。因此，尽管传统分类器可能会达到良好的精度得分，但它们的梯度可能不可靠，并且可能会阻碍生成结果的改善。最近的工作发现，对抗性强大的分类器表现出与人类感知一致的梯度，这些梯度可以更好地指导生成过程，以实现语义意义的图像。我们通过定义和训练时间依赖于稳健的分类器并将其用作生成扩散模型的指导来利用这一观察结果。在有关高度挑战性和多样化的Imagenet数据集的实验中，我们的方案引入了更明显的中间梯度，更好地与理论发现保持一致，以及在几个评估指标下的改进的生成结果。此外，我们进行了一项意见调查，其发现表明人类评估者更喜欢我们的方法的结果。

Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题