论文标题

自旋:对各向同性网络共享参数的经验评估

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

论文作者

Lin, Chien-Yu, Prabhu, Anish, Merth, Thomas, Mehta, Sachin, Ranjan, Anurag, Horton, Maxwell, Rastegari, Mohammad

论文摘要

最近的各向同性网络(例如Convmixer和Vision Transformers)在视觉识别任务中发现了巨大的成功,匹配或胜过非各向异性卷积神经网络(CNNS)。各向同性架构特别适合跨层重量共享,这是一种有效的神经网络压缩技术。在本文中,我们对各向同性网络中共享参数的方法(SPIN)进行了经验评估。我们提出了一个框架,以正式化重量共享设计决策并对此设计空间进行全面的经验评估。在我们的实验结果的指导下,我们提出了一种权重共享策略,以与单独的传统缩放方法相比,在拖放和参数与准确性方面,产生一个具有更好总体效率的模型家族,例如,将Convmixer压缩为1.9倍,同时提高Imagenet上的准确性。最后,我们进行了定性研究,以进一步了解各向同性体系结构中的体重共享的行为。该代码可在https://github.com/apple/ml-pin上找到。

Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures. The code is available at https://github.com/apple/ml-spin.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源