在室内场景中进行语义细分的自我监督的预训练

论文标题

在室内场景中进行语义细分的自我监督的预训练

Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene

论文作者

Shrestha, Sulabh, Li, Yimeng, Kosecka, Jana

论文摘要

使用语义信息赋予室内场景地图的能力是执行不同任务的机器人代理的组成部分，例如目标驱动导航，对象搜索或对象重排。最新的方法使用深层卷积神经网络（DCNN）来预测图像的语义分割作为这些任务的有用表示。语义细分的准确性取决于目标环境中标记的数据的可用性和量，或者在测试环境和训练环境之间弥合域间隙的能力。我们提出了Regonconstist，这是一种自我监督的语义分割模型预训练的方法，利用了代理在新型环境中移动和注册多个视图的能力。考虑到用于像素级数据关联的空间和时间一致性提示，我们使用对比度学习的变体来训练DCNN模型，以预测目标环境中RGB视图的语义分割。所提出的方法的表现优于预先训练的ImageNet上的模型，并在使用训练完全相同任务但在其他数据集上的模型时可以实现竞争性能。我们还进行各种消融研究，以分析和证明我们提出的方法的功效。

The ability to endow maps of indoor scenes with semantic information is an integral part of robotic agents which perform different tasks such as target driven navigation, object search or object rearrangement. The state-of-the-art methods use Deep Convolutional Neural Networks (DCNNs) for predicting semantic segmentation of an image as useful representation for these tasks. The accuracy of semantic segmentation depends on the availability and the amount of labeled data from the target environment or the ability to bridge the domain gap between test and training environment. We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model, exploiting the ability of the agent to move and register multiple views in the novel environment. Given the spatial and temporal consistency cues used for pixel level data association, we use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment. The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset. We also perform various ablation studies to analyze and demonstrate the efficacy of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题