基于RGB的语义细分，使用自我监督的深度预训练

论文标题

基于RGB的语义细分，使用自我监督的深度预训练

RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training

论文作者

Lahoud, Jean, Ghanem, Bernard

论文摘要

尽管众所周知的大规模数据集（例如ImageNet）已将图像的理解向前驱动，但这些数据集中的大多数都需要大量的手动注释，因此不容易扩展。这限制了图像理解技术的进步。这些大规模数据集的影响几乎可以在每项视觉任务和技术中以预训练的形式进行初始化。在这项工作中，我们提出了一种易于扩展和自我监督的技术，该技术可用于预先培训任何语义RGB分割方法。特别是，我们的预训练方法利用可以使用深度传感器获得的自动生成的标签。这些标签用HN标签表示，代表了不同的高度和正常贴剂，这些贴片允许挖掘局部语义信息，这些信息对于语义RGB分割的任务很有用。我们展示了我们建议使用HN标签的自我监督的预训练如何用于替换ImageNet预训练，同时使用25倍较少的图像，而无需任何手动标记。我们使用HN标签预先培训语义分割网络，它比我们的最终任务更像是我们的最终任务，而不是预先培训较少相关的任务，例如用成像网分类。我们在两个数据集（NYUV2和CAMVID）上进行评估，并且我们展示了任务的相似性不仅在加速前训练过程中是有利的，而且在实现更好的最终语义细分准确性方面也比Imagenet预训练更好

Although well-known large-scale datasets, such as ImageNet, have driven image understanding forward, most of these datasets require extensive manual annotation and are thus not easily scalable. This limits the advancement of image understanding techniques. The impact of these large-scale datasets can be observed in almost every vision task and technique in the form of pre-training for initialization. In this work, we propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method. In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors. These labels, denoted by HN-labels, represent different height and normal patches, which allow mining of local semantic information that is useful in the task of semantic RGB segmentation. We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training, while using 25x less images and without requiring any manual labeling. We pre-train a semantic segmentation network with our HN-labels, which resembles our final task more than pre-training on a less related task, e.g. classification with ImageNet. We evaluate on two datasets (NYUv2 and CamVid), and we show how the similarity in tasks is advantageous not only in speeding up the pre-training process, but also in achieving better final semantic segmentation accuracy than ImageNet pre-training

下载PDF全文

下载文献需遵守相关版权规定

论文标题