神经网络修剪的统计机械分析

论文标题

神经网络修剪的统计机械分析

Statistical Mechanical Analysis of Neural Network Pruning

论文作者

Acharyya, Rupam, Chattoraj, Ankani, Zhang, Boyu, Das, Shouman, Stefankovic, Daniel

论文摘要

经常使用修剪技术压缩具有大量参数的深度学习体系结构，以确保部署过程中的推理计算效率。尽管有许多经验进步，但对不同修剪方法的有效性缺乏理论上的理解。我们检查了在教师框架的统计力学制定中检查不同的修剪技术，并得出了其概括误差（GE）范围。已经表明，在实际数据集中测试时，基于确定点过程（DPP）的节点修剪方法优于竞争方法。使用上述设置中的GE界限，我们为其经验观察提供了理论保证。文献中的另一个一致发现是，对于固定数量的参数，稀疏的神经网络（边缘修剪）比密集的神经网络（节点修剪）更好地概括了。我们使用我们的理论设置来证明这一发现，并表明即使基线随机边缘修剪方法的性能都比DPP节点修剪方法更好。我们还可以在实际数据集上进行经验验证这一点。

Deep learning architectures with a huge number of parameters are often compressed using pruning techniques to ensure computational efficiency of inference during deployment. Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. It has been shown that Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. Using GE bounds in the aforementioned setup we provide theoretical guarantees for their empirical observations. Another consistent finding in literature is that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters. We use our theoretical setup to prove this finding and show that even the baseline random edge pruning method performs better than the DPP node pruning method. We also validate this empirically on real datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题