重新思考重量衰减以进行有效的神经网络修剪

论文标题

重新思考重量衰减以进行有效的神经网络修剪

Rethinking Weight Decay For Efficient Neural Network Pruning

论文作者

Tessier, Hugo, Gripon, Vincent, Léonardon, Mathieu, Arzel, Matthieu, Hannagan, Thomas, Bertrand, David

论文摘要

修剪于1980年代后期引入了1980年代后期，现在已成为压缩深神经网络的主食。尽管近几十年来进行了许多创新，但修剪方法仍然面临核心问题，阻碍了其性能或可扩展性。我们从野外的早期工作中汲取灵感，尤其是使用重量衰减来实现稀疏性，我们引入了选择性重量衰减（SWD），该衰减在整个训练过程中进行了高效，连续的修剪。理论上以拉格朗日平滑为基础，我们的方法用途广泛，可以应用于多个任务，网络和修剪结构。我们表明，在CIFAR-10，CORA和IMAGENET ILSVRC2012数据集上，SWD与性能与参数比的最新方法相比，与最先进的方法相比。

Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题