一单位：通过微调单个图像的扩散模型编辑文本驱动的图像编辑

论文标题

一单位：通过微调单个图像的扩散模型编辑文本驱动的图像编辑

UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image

论文作者

Valevski, Dani, Kalman, Matan, Molad, Eyal, Segalis, Eyal, Matias, Yossi, Leviathan, Yaniv

论文摘要

文本驱动的图像生成方法最近显示出令人印象深刻的结果，使随意用户可以通过提供文本描述来生成高质量的图像。但是，编辑现有图像的类似功能仍然遥不可及。文本驱动的图像编辑方法通常需要编辑蒙版，与需要重大视觉更改的编辑斗争，并且无法轻松保留编辑部分的特定细节。在本文中，我们观察到，只需在单个图像上对图像生成模型即可转换为图像编辑模型。我们还表明，在采样和对基本图像进行插值之前，用基本图像的液体版本初始化随机采样器在采样后进一步提高了编辑操作的质量。结合了这些观察结果，我们提出了一种新的图像编辑方法。 Unitune作为输入任意图像和文本编辑描述，并执行编辑，同时保持对输入图像的高保真度。 Uniteune不需要其他输入，例如掩码或草图，并且可以在同一图像上执行多个编辑，而无需再进行重新培训。我们在各种不同用例中使用成像模型测试我们的方法。我们证明它是广泛适用的，可以执行令人惊讶的表现力编辑操作，包括那些需要重大视觉变化的人，而这些变化以前是不可能的。

Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-driven image editing methods usually need edit masks, struggle with edits that require significant visual changes and cannot easily keep specific details of the edited portion. In this paper we make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image. We also show that initializing the stochastic sampler with a noised version of the base image before the sampling and interpolating relevant details from the base image after sampling further increase the quality of the edit operation. Combining these observations, we propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image. UniTune does not require additional inputs, like masks or sketches, and can perform multiple edits on the same image without retraining. We test our method using the Imagen model in a range of different use cases. We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题