自动化的渐进学习，以进行视觉变压器的有效培训

论文标题

自动化的渐进学习，以进行视觉变压器的有效培训

Automated Progressive Learning for Efficient Training of Vision Transformers

论文作者

Li, Changlin, Zhuang, Bohan, Wang, Guangrun, Liang, Xiaodan, Chang, Xiaojun, Yang, Yi

论文摘要

视力变压器（VIT）的最新进展带来了对计算能力的狂热胃口，高光线迫切需要开发有效的VIT培训方法。渐进式学习是一种培训计划，模型能力在培训期间逐渐增长，它已经开始表现出其有效培训的能力。在本文中，我们通过自定义和自动化进步学习来有效地培训VIT。首先，我们通过引入动量增长（Mogrow）来弥合模型增长带来的差距，从而为逐步学习VIT提供强大的手动基线。然后，我们提出了自动化的渐进式学习（AutoProg），这是一种有效的培训计划，旨在通过自动增加训练超负荷来实现无损加速；这是通过自适应地决定在渐进学习过程中是否应该增长多少，在何处和多少钱来实现的。具体而言，我们首先放宽了子网结构优化问题的增长时间表的优化，然后通过弹性超级网表示子网络性能的一次性估计。通过回收超级网的参数，将搜索开销缩小为最小值。使用两个代表性的VIT模型（DEIT和VOLO）对Imagenet进行有效培训的广泛实验表明，Autoprog可以在没有性能下降的情况下加速高达85.1％的VITS训练。代码：https：//github.com/changlin31/autoprog

Recent advances in vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth. Then, we propose automated progressive learning (AutoProg), an efficient training scheme that aims to achieve lossless acceleration by automatically increasing the training overload on-the-fly; this is achieved by adaptively deciding whether, where and how much should the model grow during progressive learning. Specifically, we first relax the optimization of the growth schedule to sub-network architecture optimization problem, then propose one-shot estimation of the sub-network performance via an elastic supernet. The searching overhead is reduced to minimal by recycling the parameters of the supernet. Extensive experiments of efficient training on ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that AutoProg can accelerate ViTs training by up to 85.1% with no performance drop. Code: https://github.com/changlin31/AutoProg

下载PDF全文

下载文献需遵守相关版权规定

论文标题