正弦：具有文本对图像扩散模型的单图像编辑

论文标题

正弦：具有文本对图像扩散模型的单图像编辑

SINE: SINgle Image Editing with Text-to-Image Diffusion Models

论文作者

Zhang, Zhixing, Han, Ligong, Ghosh, Arnab, Metaxas, Dimitris, Ren, Jian

论文摘要

关于扩散模型的最新研究表明，有很强的能力可以使图像产生，例如文本引导的图像合成。这样的成功激发了许多尝试使用大规模预训练的扩散模型来解决具有挑战性的问题的努力 - 真实的图像编辑。在该区域进行的作品学习了与包含同一对象的几张图像相对应的独特文本令牌。但是，在许多情况下，只有一张图像可用，例如用珍珠耳环的女孩的绘画。使用现有的作品，以微调具有单个图像的预训练扩散模型会导致严重的过度拟合问题。预先训练的扩散模型中的信息泄漏使编辑无法保持与给定图像相同的内容，同时创建语言指导描绘的新功能。这项工作旨在解决单片编辑的问题。我们提出了一个基于模型的新型指导，基于无分类器的指导，以便可以将在单个图像上训练的模型的知识蒸馏到预训练的扩散模型中，即使使用一个给定的图像也可以创建内容。此外，我们提出了一种基于补丁的微调，可以有效地帮助模型生成任意分辨率的图像。我们提供广泛的实验来验证我们方法的设计选择，并显示出令人鼓舞的编辑功能，包括更改样式，内容添加和对象操纵。该代码可用于研究目的，网址为https://github.com/zhang-zx/sine.git。

Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem--real image editing. Works conducted in this area learn a unique textual token corresponding to several images containing the same object. However, under many circumstances, only one image is available, such as the painting of the Girl with a Pearl Earring. Using existing works on fine-tuning the pre-trained diffusion models with a single image causes severe overfitting issues. The information leakage from the pre-trained diffusion models makes editing can not keep the same content as the given image while creating new features depicted by the language guidance. This work aims to address the problem of single-image editing. We propose a novel model-based guidance built upon the classifier-free guidance so that the knowledge from the model trained on a single image can be distilled into the pre-trained diffusion model, enabling content creation even with one given image. Additionally, we propose a patch-based fine-tuning that can effectively help the model generate images of arbitrary resolution. We provide extensive experiments to validate the design choices of our approach and show promising editing capabilities, including changing style, content addition, and object manipulation. The code is available for research purposes at https://github.com/zhang-zx/SINE.git .

下载PDF全文

下载文献需遵守相关版权规定

论文标题