look-into-object：对象识别的自我监督结构建模

论文标题

look-into-object：对象识别的自我监督结构建模

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

论文作者

Zhou, Mohan, Bai, Yalong, Zhang, Wei, Zhao, Tiejun, Mei, Tao

论文摘要

大多数对象识别方法主要集中在学习判别视觉模式上，同时忽略整体对象结构。尽管重要的是，结构建模通常需要大量的手动注释，因此是劳动密集型的。在本文中，我们建议通过将自学意义纳入传统框架中“研究对象”（明确但本质上对对象结构进行建模）。我们表明，可以实质上增强识别骨干，以实现更强大的表示学习，而无需任何额外的注释和推理速度。具体来说，我们首先提出了一个对象扩展的学习模块，以根据同一类别中的实例共享的视觉模式定位对象。然后，我们设计了一个空间上下文学习模块，用于通过预测范围内的相对位置来建模对象的内部结构。在训练期间，这两个模块可以轻松插入任何骨干网络，并在推理时间分离。广泛的实验表明，我们的外观对象方法（LIO）可以在许多基准上获得大量的性能增长，包括通用对象识别（ImageNet）和细粒对象识别任务（CUB，CAR，汽车，飞机）。我们还表明，这种学习范式可以高度推广到其他任务，例如对象检测和分割（MS COCO）。项目页面：https：//github.com/jdai-cv/lio。

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: https://github.com/JDAI-CV/LIO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题