培训有效的CNN：调整神经网络的螺母和螺栓，以使模型更轻，更快，稳健

论文标题

培训有效的CNN：调整神经网络的螺母和螺栓，以使模型更轻，更快，稳健

Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models

论文作者

Ethiraj, Sabeesh, Bolla, Bharath Kumar

论文摘要

深度学习彻底改变了计算机视觉，自然语言理解，语音识别，信息检索等领域。在过去的十年中，许多技术都在发展，使模型变得更轻，更快，更强大，并更好地概括。但是，许多深度学习从业人员都坚持使用预先训练的模型和体系结构，这些模型和体系结构主要是在标准数据集上进行的，例如Imagenet，MS-Coco，IMDB-Wiki数据集和Kinetics-700，并且是犹豫不决或不知道从SCRATCH重新设计的架构，这将带来更好的性能。这种情况会导致不适合在移动，边缘和雾之类的各种设备上的模型。此外，这些常规的培训方法令人关注，因为它们消耗了大量的计算能力。在本文中，我们重新审视了涉及架构效率的各种SOTA技术（全球平均汇总，深度卷积，挤压和激发，Blurpool），学习率（周期性学习率），数据增强（混合，切割），标签平滑（标签平滑），重量空间操纵（重量较高的重量较高（自然的重量）和敏锐的意识（尖锐的体重）和敏锐的分钟（尖锐）和分钟。我们通过依次减少训练参数的数量并使用上述技术来证明如何通过分阶段的方式构建有效的深卷积网络。我们在MNIST数据上仅获得1500个参数的MNIST数据的SOTA精度为99.2％，精度为86.01％，CIFAR-100数据集中的SOTA准确性仅超过140K参数。

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. Many techniques have evolved over the past decade that made models lighter, faster, and robust with better generalization. However, many deep learning practitioners persist with pre-trained models and architectures trained mostly on standard datasets such as Imagenet, MS-COCO, IMDB-Wiki Dataset, and Kinetics-700 and are either hesitant or unaware of redesigning the architecture from scratch that will lead to better performance. This scenario leads to inefficient models that are not suitable on various devices such as mobile, edge, and fog. In addition, these conventional training methods are of concern as they consume a lot of computing power. In this paper, we revisit various SOTA techniques that deal with architecture efficiency (Global Average Pooling, depth-wise convolutions & squeeze and excitation, Blurpool), learning rate (Cyclical Learning Rate), data augmentation (Mixup, Cutout), label manipulation (label smoothing), weight space manipulation (stochastic weight averaging), and optimizer (sharpness aware minimization). We demonstrate how an efficient deep convolution network can be built in a phased manner by sequentially reducing the number of training parameters and using the techniques mentioned above. We achieved a SOTA accuracy of 99.2% on MNIST data with just 1500 parameters and an accuracy of 86.01% with just over 140K parameters on the CIFAR-10 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题