论文标题
从层次分组学习的自我监督的视觉表示
Self-Supervised Visual Representation Learning from Hierarchical Grouping
论文作者
论文摘要
我们创建了一个框架,用于引导视觉表示从原始的视觉分组功能学习。我们通过轮廓检测器对分组进行操作,该轮廓检测器将图像划分为区域,然后将这些区域合并为树层次结构。一个小的监督数据集足以培训该分组原始的数据集。在一个大型未标记的数据集中,我们将这种学识渊博的原始性应用于自动预测层次结构结构。这些预测是对自我监视的对比特征学习的指导:我们将一个深层网络任命为产生每个像素嵌入的网络,其成对距离尊重区域层次结构。实验表明,我们的方法可以作为最先进的通用预训练,从而使下游任务受益。我们还探索了语义区域搜索和基于视频的对象实例跟踪的应用程序。
We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.