论文标题
稳定的艺术家:扩散潜在空间中的转向语义
The Stable Artist: Steering Semantics in Diffusion Latent Space
论文作者
论文摘要
大型的,文本条件的生成扩散模型最近因其在文本中产生高保真图像而令人印象深刻的表现引起了很多关注。但是,以单一的方式获得高质量的结果几乎是不可行的。相反,文本引导的图像生成涉及用户对输入进行许多略有更改,以便迭代地雕刻出所设想的图像。但是,对输入提示的略有变化通常会导致完全不同的图像产生,因此艺术家的控制受到了粒度的限制。为了提供灵活性,我们介绍了稳定的艺术家,这是一种图像编辑方法,可以对图像生成过程进行细粒度控制。主要组件是语义指导(SEGA),它沿着语义方向的可变数量引导扩散过程。这允许对图像,构图和样式的变化以及整体艺术概念的优化进行微妙的编辑。此外,SEGA可以探究潜在空间,以了解模型学到的概念的表示,甚至是“碳发射”等复杂的概念。我们在几个任务上展示了稳定的艺术家,展示了高质量的图像编辑和构图。
Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.