视觉变压器的自我监督预训练，以进行密集的预测任务

论文标题

视觉变压器的自我监督预训练，以进行密集的预测任务

Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks

论文作者

Rabarisoa, Jaonary, Belissen, Valentin, Chabot, Florian, Pham, Quoc-Cuong

论文摘要

我们提出了一个新的自我监督的预测变压器预测，以进行密集的预测任务。它基于将像素级表示与全局图像表示形式进行比较的对比损失。该策略可产生更好的本地功能，适用于密集的预测任务，而不是基于全球图像表示的对比预训练。此外，由于对比度损失所需的负面示例数量是局部特征数量的顺序，因此我们的方法不会遭受批次大小的减小。我们证明了训练策略对两个密集预测任务的有效性：语义分割和单眼深度估计。

We present a new self-supervised pre-training of Vision Transformers for dense prediction tasks. It is based on a contrastive loss across views that compares pixel-level representations to global image representations. This strategy produces better local features suitable for dense prediction tasks as opposed to contrastive pre-training based on global image representation only. Furthermore, our approach does not suffer from a reduced batch size since the number of negative examples needed in the contrastive loss is in the order of the number of local features. We demonstrate the effectiveness of our pre-training strategy on two dense prediction tasks: semantic segmentation and monocular depth estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题