铅：通过对齐特征相似性的分布来进行自我监督的地标估计

论文标题

铅：通过对齐特征相似性的分布来进行自我监督的地标估计

LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity

论文作者

Karmali, Tejan, Atrishi, Abhinav, Harsha, Sai Sree, Agrawal, Susmit, Jampani, Varun, Babu, R. Venkatesh

论文摘要

在这项工作中，我们介绍了Lead，这是一种从非注释的类别图像集合中发现地标的方法。自我监管的地标检测中的现有作品基于图像中的学习密集（像素级）特征表示，这些图像被进一步用于以半监督的方式学习地标。尽管为实例级任务（例如分类）的图像特征的自我监督学习取得了进步，但这些方法并不能确保密集的模棱两可的表示。诸如具有里程碑意义的估计等密集的预测任务是值得关注的。在这项工作中，我们介绍了一种方法，以一种自我监督的方式增强密集的模棱两可的表示。我们遵循两阶段的训练方法：首先，我们使用在实例级别运行的BYOL目标训练网络。通过该网络获得的对应关系进一步用于使用轻量级网络训练图像的密集和紧凑表示。我们表明，在特征提取器中具有这样的先验有助于具有里程碑意义的检测，即使在大量的注释下，同时也改善了跨尺度变化的概括。

In this work, we introduce LEAD, an approach to discover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题