机器人：通过接地分段来对比进行引导对象表示

论文标题

机器人：通过接地分段来对比进行引导对象表示

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation

论文作者

Wang, Renhao, Zhao, Hang, Gao, Yang

论文摘要

对比度学习中的许多最新方法都在弥补了在ImageNet等标志性图像上进行预处理和在Coco等复杂场景上进行预处理之间的差距。这一差距很大程度上是因为普遍使用的随机作物增强物获得了在不同物体的拥挤场景图像中的语义不一致的内容。以前的作品使用预处理管道来定位明显的对象以改进裁剪，但是端到端的解决方案仍然难以捉摸。在这项工作中，我们提出了一个框架，该框架通过联合学习表示和细分来实现这一目标。我们利用分割掩码来训练具有掩模依赖性的对比损失的模型，并使用经过部分训练的模型来启动更好的面膜。通过在这两个组件之间进行迭代，我们将分割信息中的对比度更新基础，并同时改善整个训练的分割。实验显示了我们的表示形式在分类，检测和分割方面鲁棒性转移到下游任务。

Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO. This gap exists largely because commonly used random crop augmentations obtain semantically inconsistent content in crowded scene images of diverse objects. Previous works use preprocessing pipelines to localize salient objects for improved cropping, but an end-to-end solution is still elusive. In this work, we propose a framework which accomplishes this goal via joint learning of representations and segmentation. We leverage segmentation masks to train a model with a mask-dependent contrastive loss, and use the partially trained model to bootstrap better masks. By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining. Experiments show our representations transfer robustly to downstream tasks in classification, detection and segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题