实例意识到的多目标自我遵守单眼深度预测

论文标题

实例意识到的多目标自我遵守单眼深度预测

Instance-aware multi-object self-supervision for monocular depth prediction

论文作者

Boulahbal, Houssem, Voicila, Adrian, Comport, Andrew

论文摘要

本文提出了一个自我监督的单眼至深度预测框架，该框架经过端到端的光度损失，不仅可以处理6-DOF摄像机运动，还可以处理6-DOF移动对象实例。自我安排是通过使用深度和场景运动（包括对象实例）在视频序列上扭曲图像来执行的。提出的方法的一种新颖性是使用变压器网络的多头注意力，该网络与随时间匹配移动对象并建模其相互作用和动力学。这可以为每个对象实例提供准确，可靠的姿势估计。大多数图像到深度的估计框架都可以假设刚性场景，这在很大程度上会降低其相对于动态对象的性能。只有少数SOTA论文说明了动态对象。所提出的方法显示出在标准基准上胜过这些方法，而动态运动对这些基准测试的影响已暴露出来。此外，所提出的图像到深度预测框架也被证明与SOTA视频对深度预测框架具有竞争力。

This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only 6-DOF camera motion but also 6-DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few SOTA papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题