分离和提升：用于规模不变的自我监视的单眼深度估计的双引擎

论文标题

分离和提升：用于规模不变的自我监视的单眼深度估计的双引擎

Detaching and Boosting: Dual Engine for Scale-Invariant Self-Supervised Monocular Depth Estimation

论文作者

Jiang, Peizhe, Yang, Wei, Ye, Xiaoqing, Tan, Xiao, Wu, Meng

论文摘要

在自我监督的场景中，单眼深度估计（MDE）已成为一种有前途的方法，因为它避免了地面真理深度的要求。尽管不断努力，但MDE仍然对缩放变化敏感，尤其是当所有训练样本都来自一台摄像头时。同时，由于摄像机的运动导致预测深度和量表变化之间的浓密耦合，因此它会进一步恶化。在本文中，我们提出了一种自我监督的MDE的规模不变方法，其中对比例敏感的特征（SSF）被脱离，而规模不变的特征（SIF）进一步增强。具体来说，提出了模仿摄像机缩放过程来分离SSF，从而使模型可靠地扩展到扩展更改，从而提出了简单但有效的数据增强。此外，动态的跨意义模块旨在通过自适应融合多尺度的跨注意功能来增强SIF。对Kitti数据集的广泛实验表明，MDE中的分离和增强策略是相互互补的，我们的方法可以针对0.097至0.090 W.R.R.T的现有作品实现新的最新性能。该代码将很快公开。

Monocular depth estimation (MDE) in the self-supervised scenario has emerged as a promising method as it refrains from the requirement of ground truth depth. Despite continuous efforts, MDE is still sensitive to scale changes especially when all the training samples are from one single camera. Meanwhile, it deteriorates further since camera movement results in heavy coupling between the predicted depth and the scale change. In this paper, we present a scale-invariant approach for self-supervised MDE, in which scale-sensitive features (SSFs) are detached away while scale-invariant features (SIFs) are boosted further. To be specific, a simple but effective data augmentation by imitating the camera zooming process is proposed to detach SSFs, making the model robust to scale changes. Besides, a dynamic cross-attention module is designed to boost SIFs by fusing multi-scale cross-attention features adaptively. Extensive experiments on the KITTI dataset demonstrate that the detaching and boosting strategies are mutually complementary in MDE and our approach achieves new State-of-The-Art performance against existing works from 0.097 to 0.090 w.r.t absolute relative error. The code will be made public soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题