通过结合双眼立体声和单眼灯光的深度估计

论文标题

通过结合双眼立体声和单眼灯光的深度估计

Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light

论文作者

Xu, Yuhua, Yang, Xiaoli, Yu, Yushan, Jia, Wei, Chu, Zhaobi, Guo, Yulan

论文摘要

众所周知，被动立体声系统不能很好地适应弱纹理对象，例如白色的墙壁。但是，这些弱纹理目标在室内环境中非常普遍。在本文中，我们介绍了一个新颖的立体声系统，该系统由两个摄像机（一个RGB摄像头和一个IR摄像头）和一个IR斑点投影仪组成。 RGB摄像机既用于深度估计和纹理获取。 IR摄像头和斑点投影仪可以形成单眼结构光（MSL）子系统，而两个相机可以形成双眼立体声子系统。 MSL子系统生成的深度图可以为立体声匹配网络提供外部指导，这可以显着提高匹配精度。为了验证提出的系统的有效性，我们构建了一个原型并在室内场景中收集测试数据集。评估结果表明，使用网络筏时，提议系统的不良2.0误差为被动立体声系统的28.2％。数据集和训练有素的模型可在https://github.com/yuhuaxu/monostereofusion上找到。

It is well known that the passive stereo system cannot adapt well to weak texture objects, e.g., white walls. However, these weak texture targets are very common in indoor environments. In this paper, we present a novel stereo system, which consists of two cameras (an RGB camera and an IR camera) and an IR speckle projector. The RGB camera is used both for depth estimation and texture acquisition. The IR camera and the speckle projector can form a monocular structured-light (MSL) subsystem, while the two cameras can form a binocular stereo subsystem. The depth map generated by the MSL subsystem can provide external guidance for the stereo matching networks, which can improve the matching accuracy significantly. In order to verify the effectiveness of the proposed system, we build a prototype and collect a test dataset in indoor scenes. The evaluation results show that the Bad 2.0 error of the proposed system is 28.2% of the passive stereo system when the network RAFT is used. The dataset and trained models are available at https://github.com/YuhuaXu/MonoStereoFusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题