论文标题
dream3d:零击文本到3D合成3D形状的先验和文本形象扩散模型
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
论文作者
论文摘要
最近的剪辑引导的3D优化方法,例如Dreamfields和Pureclipnerf,在零拍的文本到3D合成中取得了令人印象深刻的结果。但是,由于没有先验知识的刮擦训练和随机初始化,这些方法通常无法生成符合输入文本的准确和忠实的3D结构。在本文中,我们首次尝试将显式3D形状先验引入夹子引导的3D优化过程。具体来说,我们首先从文本对形阶段的输入文本中生成高质量的3D形状,作为先验的3D形状。然后,我们将其用作神经辐射场的初始化,并在完整的提示中对其进行优化。为了解决具有挑战性的文本到形状生成任务,我们提出了一种简单而有效的方法,该方法将文本和图像模式直接用强大的文本到图像扩散模型桥接。为了缩小由用于训练图像形状生成器的文本对图扩散模型和形状渲染构成的图像之间的样式域间隙,我们进一步建议共同优化可学习的文本提示,并微调文本对图像扩散模型以生成渲染风格的图像。与最先进的方法相比,我们的方法Dream3D能够产生具有较高视觉质量和形状精度的想象力3D内容。
Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods.