3DDESIGNER：使用文本引导的扩散模型迈向逼真的3D对象生成和编辑

论文标题

3DDESIGNER：使用文本引导的扩散模型迈向逼真的3D对象生成和编辑

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

论文作者

Li, Gang, Zheng, Heliang, Wang, Chaoyue, Li, Chang, Zheng, Changwen, Tao, Dacheng

论文摘要

文本指导的扩散模型在图像/视频生成和编辑中表现出了出色的性能。尽管在3D方案中很少进行探索。在本文中，我们讨论了有关此主题的三个基本和有趣的问题。首先，我们为文本引导的扩散模型配备了3D一致的一代。具体而言，我们集成了类似NERF的神经场，以为给定的相机视图产生低分辨率的粗糙结果。这样的结果可以为以下扩散过程提供3D先验作为条件信息。在脱氧扩散期间，我们通过用一种新型的两流（对应于两种不同观点）的跨视图对应关系来进一步增强3D一致性。其次，我们研究了3D本地编辑，并提出了一个两步解决方案，该解决方案可以通过从单个视图中编辑对象来生成360度操纵结果。步骤1，我们建议通过混合预测的噪声来执行2D本地编辑。步骤2，我们进行了一个噪声到文本的反转过程，该过程将2D映射到独立的文本嵌入空间中。一旦获得了相应的文本嵌入，就可以生成360度图像。最后但并非最不重要的一点是，我们将模型扩展到通过在单个图像上进行微调来执行单一的小说视图综合，首先显示了利用文本指导的潜力。广泛的实验和各种应用显示了我们3DDESIGNER的能力。项目页面可在https://3ddesigner-diffusion.github.io/上找到。

Text-guided diffusion models have shown superior performance in image/video generation and editing. While few explorations have been performed in 3D scenarios. In this paper, we discuss three fundamental and interesting problems on this topic. First, we equip text-guided diffusion models to achieve 3D-consistent generation. Specifically, we integrate a NeRF-like neural field to generate low-resolution coarse results for a given camera view. Such results can provide 3D priors as condition information for the following diffusion process. During denoising diffusion, we further enhance the 3D consistency by modeling cross-view correspondences with a novel two-stream (corresponding to two different views) asynchronous diffusion process. Second, we study 3D local editing and propose a two-step solution that can generate 360-degree manipulated results by editing an object from a single view. Step 1, we propose to perform 2D local editing by blending the predicted noises. Step 2, we conduct a noise-to-text inversion process that maps 2D blended noises into the view-independent text embedding space. Once the corresponding text embedding is obtained, 360-degree images can be generated. Last but not least, we extend our model to perform one-shot novel view synthesis by fine-tuning on a single image, firstly showing the potential of leveraging text guidance for novel view synthesis. Extensive experiments and various applications show the prowess of our 3DDesigner. The project page is available at https://3ddesigner-diffusion.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题