论文标题

草图指导的文本到图像扩散模型

Sketch-Guided Text-to-Image Diffusion Models

论文作者

Voynov, Andrey, Aberman, Kfir, Cohen-Or, Daniel

论文摘要

文本对图像模型在机器学习的演变中引入了显着的飞跃,证明了来自给定文本推出的图像的高质量综合。但是,这些强大的预告片模型仍然缺乏控制手柄,可以指导合成图像的空间特性。在这项工作中,我们引入了一种通用方法,以指导预验证的文本对图像扩散模型,并在推理期间使用另一个域(例如草图)的空间图。与以前的作品不同,我们的方法不需要培训专用模型或任务的专业编码器。我们的关键想法是训练潜在的指导预测器(LGP) - 一个小像素,多层感知器(MLP),将嘈杂图像的潜在特征映射到空间图中,其中深度从核心脱氧扩散概率概率(DDPM)网络中提取了深度特征。 LGP仅在数千张图像上训练,并构成差分指导图预测指标,在该图中计算损耗并返回损失以推动中间图像以与空间图一致。人均训练提供了灵活性和区域性,使该技术可以在室外草图(包括自由手款图纸)上表现良好。我们特别关注素描到图像翻译任务,揭示了一种稳健而表达的方式来生成遵循任意样式或域草图的指导的图像。项目页面:素描引导diffusion.github.io

Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain. Project page: sketch-guided-diffusion.github.io

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源