农作物的奶油：一击神经建筑搜索的优先级途径

论文标题

农作物的奶油：一击神经建筑搜索的优先级途径

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

论文作者

Peng, Houwen, Du, Hao, Yu, Hongyuan, Li, Qi, Liao, Jing, Fu, Jianlong

论文摘要

由于高效率和竞争性能，一弹性重量共享方法最近引起了神经体系结构搜索的极大关注。但是，跨模型的重量共享具有固有的缺陷，即超网络中的子网训练不足。为了减轻这个问题，我们提出了一种简单而有效的架构蒸馏方法。核心思想是，子网可以在整个培训过程中进行协作学习并相互教学，以增强单个模型的融合。我们介绍了优先路径的概念，该概念是指在培训期间表现出卓越表现的建筑候选人。从优先路径中提取知识可以提高子网的训练。由于优先级的路径会根据其性能和复杂性而飞行，因此最终获得的路径是农作物的乳霜。我们直接从优先的路径中选择最有希望的人作为最终体系结构，而无需使用其他复杂的搜索方法，例如增强学习或进化算法。 ImageNet上的实验验证了这种路径蒸馏方法可以改善超网络的收敛率和性能，并可以促进子网的训练。与最近的MobilenetV3和PrifitedNet家族相比，发现的架构取得了卓越的性能。此外，有关对象检测和更具挑战性搜索空间的实验显示了所提出方法的一般性和鲁棒性。代码和型号可在https://github.com/microsoft/cream.git上找到。

One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题