长颈鹿：将场景表示为组成生成神经特征场

论文标题

长颈鹿：将场景表示为组成生成神经特征场

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

论文作者

Niemeyer, Michael, Geiger, Andreas

论文摘要

深层生成模型允许在高分辨率下进行感性图像合成。但是对于许多应用程序，这还不够：内容创建也需要控制。虽然最近的一些著作调查了如何消除数据变化的潜在因素，但其中大多数在2D中运行，因此忽略了我们的世界是三维的。此外，只有很少的作品考虑场景的组成性质。我们的关键假设是，将组成3D场景表示形式纳入生成模型会导致更可控的图像合成。将场景表示为组成生成神经特征字段，使我们能够将一个或多个对象与背景以及单个对象的形状和外观脱离，同时从非结构化和未予以的图像收集中学习，而无需任何其他监督。将此场景表示与神经渲染管道相结合可产生快速逼真的图像合成模型。正如我们的实验所证明的那样，我们的模型能够解开单个对象，并允许在场景中翻译和旋转它们以及更改相机姿势。

Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Representing scenes as compositional generative neural feature fields allows us to disentangle one or multiple objects from the background as well as individual objects' shapes and appearances while learning from unstructured and unposed image collections without any additional supervision. Combining this scene representation with a neural rendering pipeline yields a fast and realistic image synthesis model. As evidenced by our experiments, our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题