论文标题
OVO:通过在线蒸馏的一声视觉变压器搜索
OVO: One-shot Vision Transformer Search with Online distillation
论文作者
论文摘要
纯变压器最近显示了视觉任务的巨大潜力。但是,它们在中小型数据集中的准确性并不令人满意。尽管某些现有方法引入了CNN作为老师,以通过蒸馏指导培训过程,但教师和学生网络之间的差距将导致次优表现。在这项工作中,我们建议使用在线蒸馏(即OVO)提出一个新的单次视觉变压器搜索框架。 OVO样品为教师和学生网络的子网络提供更好的蒸馏结果。从在线蒸馏中受益,超级网中的数千个子网都经过良好的训练,而没有额外的填充或再培训。在实验中,OVO-TI在ImageNet上获得了73.32%的TOP-1精度和CIFAR-100的75.2%。
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.