FocusFormer：专注于我们通过体系结构采样器的需求

论文标题

FocusFormer：专注于我们通过体系结构采样器的需求

FocusFormer: Focusing on What We Need via Architecture Sampler

论文作者

Liu, Jing, Cai, Jianfei, Zhuang, Bohan

论文摘要

Vision Transformers（VIT）为计算机视觉的最新突破提供了基础。但是，设计VIT的架构是艰苦的，并且在很大程度上依赖专家知识。为了自动化设计过程并结合了部署灵活性，一击神经体系结构搜索将超级网训练和体系结构专业化解除了各种部署场景。为了应对超级网中的大量子网络，现有方法在培训期间的每个更新步骤中都同样重要且随机对其中的一些架构进行处理。在体系结构搜索过程中，这些方法着重于在性能和资源消耗的帕累托前沿寻找体系结构，这在培训和部署之间形成了差距。在本文中，我们设计了一种简单而有效的方法，称为FocusFormer，以弥合这种差距。为此，我们建议学习一个体系结构采样器，以在超级网训练期间在不同的资源限制下为帕累托前沿上的这些架构分配更高的采样概率，从而使它们充分优化，从而提高其性能。在专业化过程中，我们可以直接使用训练有素的体系结构采样器来获得满足给定资源约束的准确体系结构，从而大大提高了搜索效率。关于CIFAR-100和ImageNet的广泛实验表明，我们的FocusFormer能够提高搜索架构的性能，同时大大降低搜索成本。例如，在ImageNet上，我们的焦点图1.4G Flops的表现优于自动构架Ti，就TOP-1的准确性而言。

Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5% in terms of the Top-1 accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题