交叉转换器：空间意识到的几声转移

论文标题

交叉转换器：空间意识到的几声转移

CrossTransformers: spatially-aware few-shot transfer

论文作者

Doersch, Carl, Gupta, Ankush, Zisserman, Andrew

论文摘要

给定具有很少数据$的新任务 - 例如分类问题中的新类别或现代视觉系统的输入$ - $ $的域变化，很快就会降低。在这项工作中，我们说明了基础现代视觉系统的神经网络表示如何受到监督崩溃，从而失去了执行培训任务所需的任何信息，包括可能需要转移到新任务或领域所必需的信息。然后，我们提出了两种减轻此问题的方法。首先，我们采用自我监督的学习来鼓励通用功能更好地转移。其次，我们提出了一种新型的基于变压器的神经网络架构，称为CrossTransformers，该结构可以拍摄少量标记的图像和未标记的查询，找到查询和标记图像之间的粗空间对应关系，然后通过计算类空间相应的功能之间的距离来推断类成员。结果是一个分类器，对任务和域移动更强大，我们通过在Meta-dataset上的最新性能来证明，这是一个用于评估从ImageNet转移到许多其他视觉数据集的数据集。

Given new tasks with very little data$-$such as new classes in a classification problem or a domain shift in the input$-$performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题