论文标题
360fusionnerf:具有联合指导的全景神经辐射场
360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance
论文作者
论文摘要
我们提出了一种基于神经辐射场(NERF)的单个$ 360^\ PANORAMA图像合成新视图的方法。在类似环境中的先前研究依赖于多层感知的邻居插值能力来完成由遮挡引起的缺失区域,这导致其预测中的伪像。我们提出了一个半监督的学习框架360fusionnerf,我们介绍了几何监督和语义一致性,以指导渐进式培训过程。首先,将输入图像重新投影至$ 360^\ Circ $图像,并在其他相机位置提取辅助深度图。除NERF颜色指导外,深度监督还改善了合成视图的几何形状。此外,我们引入了语义一致性损失,鼓励新观点的现实渲染。我们使用预先训练的视觉编码器(例如剪辑)提取这些语义功能,这是一种视觉变压器,对经过自然语言监督从网络中挖掘出的数亿种不同的2D照片进行了训练。实验表明,我们提出的方法可以在保留场景的特征的同时产生未观察到的区域的合理完成。当在各种场景中接受培训时,360FusionnerF转移到合成结构3D数据集(PSNR〜5%,SSIM〜3%LPIPP〜13%)时,始终达到最先进的性能LPIPS〜18%)。
We present a method to synthesize novel views from a single $360^\circ$ panorama image based on the neural radiance field (NeRF). Prior studies in a similar setting rely on the neighborhood interpolation capability of multi-layer perceptions to complete missing regions caused by occlusion, which leads to artifacts in their predictions. We propose 360FusionNeRF, a semi-supervised learning framework where we introduce geometric supervision and semantic consistency to guide the progressive training process. Firstly, the input image is re-projected to $360^\circ$ images, and auxiliary depth maps are extracted at other camera positions. The depth supervision, in addition to the NeRF color guidance, improves the geometry of the synthesized views. Additionally, we introduce a semantic consistency loss that encourages realistic renderings of novel views. We extract these semantic features using a pre-trained visual encoder such as CLIP, a Vision Transformer trained on hundreds of millions of diverse 2D photographs mined from the web with natural language supervision. Experiments indicate that our proposed method can produce plausible completions of unobserved regions while preserving the features of the scene. When trained across various scenes, 360FusionNeRF consistently achieves the state-of-the-art performance when transferring to synthetic Structured3D dataset (PSNR~5%, SSIM~3% LPIPS~13%), real-world Matterport3D dataset (PSNR~3%, SSIM~3% LPIPS~9%) and Replica360 dataset (PSNR~8%, SSIM~2% LPIPS~18%).