从层次分组学习的自我监督的视觉表示

论文标题

从层次分组学习的自我监督的视觉表示

Self-Supervised Visual Representation Learning from Hierarchical Grouping

论文作者

Zhang, Xiao, Maire, Michael

论文摘要

我们创建了一个框架，用于引导视觉表示从原始的视觉分组功能学习。我们通过轮廓检测器对分组进行操作，该轮廓检测器将图像划分为区域，然后将这些区域合并为树层次结构。一个小的监督数据集足以培训该分组原始的数据集。在一个大型未标记的数据集中，我们将这种学识渊博的原始性应用于自动预测层次结构结构。这些预测是对自我监视的对比特征学习的指导：我们将一个深层网络任命为产生每个像素嵌入的网络，其成对距离尊重区域层次结构。实验表明，我们的方法可以作为最先进的通用预训练，从而使下游任务受益。我们还探索了语义区域搜索和基于视频的对象实例跟踪的应用程序。

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题