FS-DERT：带有提示且无需重新训练的很少的检测变压器

论文标题

FS-DERT：带有提示且无需重新训练的很少的检测变压器

FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

论文作者

Bulat, Adrian, Guerrero, Ricardo, Martinez, Brais, Tzimiropoulos, Georgios

论文摘要

本文是在几个射击对象检测（FSOD）上进行的，其中给定一些模板（示例）描绘了新型类（在训练中未见），目的是检测其在一组图像中的所有发生。从实际的角度来看，FSOD系统必须实现以下desiderata：（a）必须按IS进行使用，而无需在测试时间进行任何微调，（b）它必须能够同时处理任意数量的新颖对象，同时支持每个班级的任意示例，并且必须获得（c）与封闭的系统相当的准确性。为了满足（a） - （c），在这项工作中，我们做出以下贡献：我们首次介绍了一个基于视觉提示的简单但功能强大的，很少的检测变压器（FS-DRENTERTER（FS-DERTER），可以解决Desiderata（a）和（b）。我们的系统建立在DETR框架的基础上，基于两个关键思想扩展：（1）在测试时间内将提供的新颖类的视觉模板作为视觉提示，以及（2）``邮票''这些提示，并用伪级嵌入（AKIN）（akin to soft Pressing），这些提示是在解码器的输出中预测的。重要的是，我们表明我们的系统不仅比现有方法更灵活，而且还迈向满足Desideratum（C）的一步。具体而言，它比所有不需要微调甚至匹配的方法都要精确得多，并且在最成熟的基准（Pascal VOC＆Mscoco）上匹配了当前最新的基于微调的方法。

This paper is on Few-Shot Object Detection (FSOD), where given a few templates (examples) depicting a novel class (not seen during training), the goal is to detect all of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following desiderata: (a) it must be used as is, without requiring any fine-tuning at test time, (b) it must be able to process an arbitrary number of novel objects concurrently while supporting an arbitrary number of examples from each class and (c) it must achieve accuracy comparable to a closed system. Towards satisfying (a)-(c), in this work, we make the following contributions: We introduce, for the first time, a simple, yet powerful, few-shot detection transformer (FS-DETR) based on visual prompting that can address both desiderata (a) and (b). Our system builds upon the DETR framework, extending it based on two key ideas: (1) feed the provided visual templates of the novel classes as visual prompts during test time, and (2) ``stamp'' these prompts with pseudo-class embeddings (akin to soft prompting), which are then predicted at the output of the decoder. Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c). Specifically, it is significantly more accurate than all methods that do not require fine-tuning and even matches and outperforms the current state-of-the-art fine-tuning based methods on the most well-established benchmarks (PASCAL VOC & MSCOCO).

下载PDF全文

下载文献需遵守相关版权规定

论文标题