论文标题
彩票门票假设用于计算机视觉模型中有监督和自我监督的预训练
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
论文作者
论文摘要
计算机视觉界一直在各种预训练的模型中重新获得热情,包括经典的Imagenet监督预训练和最近出现的自我监督的预训练,例如Simclr和Moco。预训练的权重通常会增加广泛的下游任务,包括分类,检测和分割。最新研究表明,训练前的巨大模型能力受益。我们在此很好奇并问:经过预培训后,预训练的模型是否确实必须为其下游可转移性保持庞大吗? 在本文中,我们通过彩票假设(LTH)的镜头检查了受监督和自我监管的预训练模型。 LTH标识了高度稀疏的匹配子网,这些子网可以与(几乎)划痕隔离训练,但仍达到了完整的模型的性能。我们扩展了LTH的范围,并质疑在预训练的计算机视觉模型中是否仍然存在匹配的子网,这些模型均具有相同的下游传输性能。 Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR, and MoCo, we are consistently able to locate such matching subnetworks at 59.04% to 96.48% sparsity that transfer universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights.进一步的分析表明,从不同的训练中发现的子网倾向于产生各种遮罩结构和扰动敏感性。我们得出的结论是,在计算机视觉的预训练范式中,核心观察结果通常仍然相关,但是在某些情况下需要更加微妙的讨论。代码和预训练的模型将在以下网址提供:https://github.com/vita-group/cv_lth_pre-training。
The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its downstream transferability? In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch yet still reach the full models' performance. We extend the scope of LTH and question whether matching subnetworks still exist in pre-trained computer vision models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR, and MoCo, we are consistently able to locate such matching subnetworks at 59.04% to 96.48% sparsity that transfer universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_LTH_Pre-training.