从单眼视频中学习的视觉描述符

论文标题

从单眼视频中学习的视觉描述符

Visual Descriptor Learning from Monocular Video

论文作者

Deekshith, Umashankar, Gajjar, Nishit, Schwarz, Max, Behnke, Sven

论文摘要

通讯估计是研究最广泛，但仅部分解决的计算机视觉领域之一，并在跟踪，映射，对象和环境的识别中进行了许多应用。在本文中，我们提出了一种新颖的方法，以在RGB图像上估算密集的对应关系，在该图像中，通过训练完全卷积的网络从视频示例中学到了视觉描述符。大多数深度学习方法通过使用RGB-D视频训练大量昂贵的标签数据来培训网络或通过强3D生成模型进行标签来解决此问题。我们的方法使用对比损失从RGB视频中学习，其中相对标记是从光流中估算的。我们在可用地面真相信息的渲染视频进行定量分析中证明了功能。该方法不仅在具有相同背景的测试数据上表现良好，而且还将其推广到具有新背景的情况。所学的描述符是独一无二的，由网络确定的表示是全局。我们进一步显示了该方法对现实世界视频的适用性。

Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep learning methods solve this by training the network with a large set of expensive labeled data or perform labeling through strong 3D generative models using RGB-D videos. Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow. We demonstrate the functionality in a quantitative analysis on rendered videos, where ground truth information is available. Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background. The descriptors learned are unique and the representations determined by the network are global. We further show the applicability of the method to real-world videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题