大型自我监督模型是强大的半监督学习者

论文标题

大型自我监督模型是强大的半监督学习者

Big Self-Supervised Models are Strong Semi-Supervised Learners

论文作者

Chen, Ting, Kornblith, Simon, Swersky, Kevin, Norouzi, Mohammad, Hinton, Geoffrey

论文摘要

在最佳利用大量未标记的数据的同时，是从几乎没有标记的示例中学习的一个范式，是无监督的预审议，然后是监督的微调。尽管该范式以任务不可固定的方式使用了未标记的数据，但与对计算机视觉的半监督学习的常见方法相反，我们表明它对Imagenet上的半监督学习非常有效。我们方法的关键要素是在预训练和微调过程中使用大（深和宽）网络。我们发现，标签越多，这种方法（任务不可固定的数据使用未标记的数据）受益于更大的网络。经过微调后，可以通过第二次使用未标记的示例，但以特定于任务的方式使用未标记的示例，可以进一步改进大型网络，并将分类准确性损失较小。提出的半监督学习算法可以通过三个步骤进行总结：使用SIMCLRV2对大型重新连接模型进行预处理预处理，并在一些标记的示例上对微调进行了微调，并用不标记的示例进行蒸馏，用于精炼和转移任务特定的知识。此过程使用Resnet-50，仅1％的标签（$ \ le $ $ $ $ $ \ le $ 13），可以实现73.9％的Imagenet Top-1精度，$ 10 \ times $ $ 10 \ times $提高了标签效率的标签效率比以前的最新时间。凭借10％的标签，Resnet-50接受了我们的方法的培训，可以达到77.5％的TOP-1准确性，超过所有标签的标准监督培训。

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题