用自动生成的轮廓替换标记的实数数据集

论文标题

用自动生成的轮廓替换标记的实数数据集

Replacing Labeled Real-image Datasets with Auto-generated Contours

论文作者

Kataoka, Hirokatsu, Hayamizu, Ryo, Yamada, Ryosuke, Nakashima, Kodai, Takashima, Sora, Zhang, Xinyu, Martinez-Noriega, Edgar Josafat, Inoue, Nakamasa, Yokota, Rio

论文摘要

在目前的工作中，我们表明，公式驱动的监督学习（FDSL）的性能可以匹配甚至超过Imagenet-21K的表现，而无需在视觉变压器（VITS）的预训练期间使用真实的图像，人类和自我选择。例如，在ImagEnet-1K上进行微调时，在Imagenet-21K上预先训练的VIT-BASE显示出81.8％的TOP-1精度，而FDSL在相同的条件下进行预训练（图像，超参数数量，超参数和时代数量）时，预先训练时显示了82.7％的TOP-1准确性。公式产生的图像避免了隐私/版权问题，标记成本和错误以及真实图像遭受的偏见，因此具有巨大的训练前一般模型的潜力。为了了解合成图像的性能，我们测试了两个假设，即（i）对象轮廓是FDSL数据集中重要的，并且（ii）创建标签的参数数量增加会影响FDSL预训练的性能改善。为了检验以前的假设，我们构建了一个由简单对象轮廓组合组成的数据集。我们发现该数据集可以匹配分形的性能。对于后一种假设，我们发现增加训练任务的难度通常会导致更好的微调准确性。

In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy when pre-trained under the same conditions (number of images, hyperparameters, and number of epochs). Images generated by formulas avoid the privacy/copyright issues, labeling cost and errors, and biases that real images suffer from, and thus have tremendous potential for pre-training general models. To understand the performance of the synthetic images, we tested two hypotheses, namely (i) object contours are what matter in FDSL datasets and (ii) increased number of parameters to create labels affects performance improvement in FDSL pre-training. To test the former hypothesis, we constructed a dataset that consisted of simple object contour combinations. We found that this dataset can match the performance of fractals. For the latter hypothesis, we found that increasing the difficulty of the pre-training task generally leads to better fine-tuning accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题