论文标题
带有小数据集的图像分类:概述和基准测试
Image Classification with Small Datasets: Overview and Benchmark
论文作者
论文摘要
与小型数据集使用的图像分类一直是最近的活跃研究领域。但是,由于该范围的研究仍处于起步阶段,因此缺少两种关键成分,无法确保可靠和真实的进步:对最新状态的系统性和广泛的概述,以及一个常见的基准,可以在已发表的方法之间进行客观比较。本文解决了这两个问题。首先,我们系统地组织并连接了过去的研究,以巩固当前分散和分散的社区。其次,我们提出了一个共同的基准,可以客观地比较方法。它由跨越各个域(例如自然图像,医学图像,卫星数据)和数据类型(RGB,灰度,灰度,多光谱)组成的五个数据集。我们使用此基准重新评估标准的跨凝结基线,并在2017年至2021年之间在著名的场地发表了十种现有方法。令人惊讶的是,我们发现,对持有验证数据进行彻底的超参数调整导致了高度竞争性的基线,并突出了多年来的性能增长。确实,只有一种可以追溯到2019年的单一专业方法显然赢得了我们的基准测试,并且表现优于基线分类器。
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.