论文标题
3D场景的跨域合成对野外深度和正常估计
Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding
论文作者
论文摘要
我们提出了一种跨域推理技术,该技术从合成数据中学习到估计深度和正态的野外全向3D场景中遇到的野生媒介,在现实世界中不受控制的设置中遇到的情况。为此,我们介绍了Ubotnet,这是一种结合UNET和瓶颈变压器元素以预测一致的场景正常和深度的体系结构。我们还介绍了包含24,335个全向图像的Omnihorizon合成数据集,这些图像代表了各种各样的室外环境,包括建筑物,街道和各种植被。该数据集由扩展的,栩栩如生的虚拟空间生成,并包含动态场景元素,例如改变照明条件,一天中的不同时间,行人和车辆。我们的实验表明,与现有模型相比,Ubotnet在深度估计和正常估计方面的准确性显着提高。最后,我们使用仅在我们的合成Omnihorizon数据集中训练的Ubotnet验证了真实室外图像的跨域合成深度和正常估计,这证明了合成数据集和对现实世界场景了解应用程序的拟议网络的潜力。
We present a cross-domain inference technique that learns from synthetic data to estimate depth and normals for in-the-wild omnidirectional 3D scenes encountered in real-world uncontrolled settings. To this end, we introduce UBotNet, an architecture that combines UNet and Bottleneck Transformer elements to predict consistent scene normals and depth. We also introduce the OmniHorizon synthetic dataset containing 24,335 omnidirectional images that represent a wide variety of outdoor environments, including buildings, streets, and diverse vegetation. This dataset is generated from expansive, lifelike virtual spaces and encompasses dynamic scene elements, such as changing lighting conditions, different times of day, pedestrians, and vehicles. Our experiments show that UBotNet achieves significantly improved accuracy in depth estimation and normal estimation compared to existing models. Lastly, we validate cross-domain synthetic-to-real depth and normal estimation on real outdoor images using UBotNet trained solely on our synthetic OmniHorizon dataset, demonstrating the potential of both the synthetic dataset and the proposed network for real-world scene understanding applications.