功能 - 螺旋变压器，用于几个射击分段

论文标题

功能 - 螺旋变压器，用于几个射击分段

Feature-Proxy Transformer for Few-Shot Segmentation

论文作者

Zhang, Jian-Wei, Sun, Yifan, Yang, Yi, Chen, Wei

论文摘要

几乎没有射击分段（FSS）旨在在给出一些带注释的支持样本的新颖类上进行语义分割。通过重新考虑最近的进步，我们发现当前的FSS框架已经偏离了监督分段框架：鉴于深层特征，FSS方法通常使用复杂的解码器来执行复杂的像素匹配，而受监督的分段方法则使用简单的线性分类头。由于解码器及其匹配管道的复杂性，遵循这样的FSS框架并不容易。本文恢复了“提取器$+$线性分类头”的直接框架，并提出了一种新颖的特征 - 透明变压器（FPTRANS）方法，其中“代理”是代表线性分类头中语义类别的向量。 FPTrans有两个关键点，用于学习判别特征和代表性代理：1）为了更好地利用有限的支持样品，该特征提取器使查询使用新颖的提示策略与从底层到顶层的支撑功能进行交互。 2）FPTrans使用多个局部背景代理（而不是单个背景代理），因为背景不是均匀的，并且可能包含一些新颖的前景区域。这两个关键点与变压器中的提示机制很容易集成到视觉变压器主链中。鉴于学识渊博的特征和代理，FPTrans直接比较了其分割的余弦相似性。尽管该框架很简单，但我们表明FPTrans与最新的基于解码器的方法相同，可以实现竞争性FSS的精度。

Few-shot segmentation (FSS) aims at performing semantic segmentation on novel classes given a few annotated support samples. With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head. Due to the intricacy of the decoder and its matching pipeline, it is not easy to follow such an FSS framework. This paper revives the straightforward framework of "feature extractor $+$ linear classification head" and proposes a novel Feature-Proxy Transformer (FPTrans) method, in which the "proxy" is the vector representing a semantic class in the linear classification head. FPTrans has two keypoints for learning discriminative features and representative proxies: 1) To better utilize the limited support samples, the feature extractor makes the query interact with the support features from the bottom to top layers using a novel prompting strategy. 2) FPTrans uses multiple local background proxies (instead of a single one) because the background is not homogeneous and may contain some novel foreground regions. These two keypoints are easily integrated into the vision transformer backbone with the prompting mechanism in the transformer. Given the learned features and proxies, FPTrans directly compares their cosine similarity for segmentation. Although the framework is straightforward, we show that FPTrans achieves competitive FSS accuracy on par with state-of-the-art decoder-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题