神经架构搜索的干扰 - 免疫分享共享

论文标题

神经架构搜索的干扰 - 免疫分享共享

Disturbance-immune Weight Sharing for Neural Architecture Search

论文作者

Niu, Shuaicheng, Wu, Jiaxiang, Zhang, Yifan, Guo, Yong, Zhao, Peilin, Huang, Junzhou, Tan, Mingkui

论文摘要

神经建筑搜索（NAS）在建筑设计社区中引起了人们的关注。成功背后的关键因素之一在于重量共享（WS）技术创造的训练效率。但是，基于WS的NAS方法通常会遇到性能障碍（PD）问题。也就是说，由于部分共同的权重，对后续体系结构的培训不可避免地会扰乱先前训练的体系结构的性能。这导致对先前体系结构的性能估计不准确，这使得很难学习良好的搜索策略。为了减轻性能干扰问题，我们提出了一种新的干扰 - 免疫更新策略，以实现模型更新。具体而言，为了保留先前体系结构所学的知识，我们通过正交梯度下降来限制正交空间中随后的体系结构的训练。配备了这种策略，我们为NAS提出了一种新型的干扰免疫训练计划。我们理论上分析了我们战略在减轻PD风险方面的有效性。关于CIFAR-10和Imagenet的广泛实验验证了我们方法的优越性。

Neural architecture search (NAS) has gained increasing attention in the community of architecture design. One of the key factors behind the success lies in the training efficiency created by the weight sharing (WS) technique. However, WS-based NAS methods often suffer from a performance disturbance (PD) issue. That is, the training of subsequent architectures inevitably disturbs the performance of previously trained architectures due to the partially shared weights. This leads to inaccurate performance estimation for the previous architectures, which makes it hard to learn a good search strategy. To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating. Specifically, to preserve the knowledge learned by previous architectures, we constrain the training of subsequent architectures in an orthogonal space via orthogonal gradient descent. Equipped with this strategy, we propose a novel disturbance-immune training scheme for NAS. We theoretically analyze the effectiveness of our strategy in alleviating the PD risk. Extensive experiments on CIFAR-10 and ImageNet verify the superiority of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题