论文标题
融合催化的修剪,以优化智能边缘设备上的深度学习
Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices
论文作者
论文摘要
深神经网络模型的计算成本的增加限制了智能应用程序对资源受限的边缘设备的适用性。尽管已经提出了许多神经网络修剪方法来压缩模型,但普遍的方法仅着眼于参数操作员(例如卷积),这可能会错过优化机会。在本文中,我们提出了一种新型的融合催化修剪方法,称为Fupruner,该方法同时优化了用于加速神经网络的参数和非参数算子。我们引入了一种积极的融合方法,以等效地转换模型,该方法扩展了修剪的优化空间,并使非参数操作员以与参数运算符相似的方式进行修剪,并应用动态过滤器修剪方法来降低模型的计算成本,同时保留准确性要求。此外,FupRuner提供了可配置的优化选项,用于控制融合和修剪,从而可以进行更灵活的性能 - 准确性权衡。在五个代表性的智能边缘平台上使用最先进的残留神经网络进行评估,即Jetson TX2,Jetson Nano,Edge TPU,NCS和NCS2,这表明了我们方法的有效性,这可以加速模型对CIFAR-10和Imagenet数据集的推断。
The increasing computational cost of deep neural network models limits the applicability of intelligent applications on resource-constrained edge devices. While a number of neural network pruning methods have been proposed to compress the models, prevailing approaches focus only on parametric operators (e.g., convolution), which may miss optimization opportunities. In this paper, we present a novel fusion-catalyzed pruning approach, called FuPruner, which simultaneously optimizes the parametric and non-parametric operators for accelerating neural networks. We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning and enables non-parametric operators to be pruned in a similar manner as parametric operators, and a dynamic filter pruning method is applied to decrease the computational cost of models while retaining the accuracy requirement. Moreover, FuPruner provides configurable optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made. Evaluation with state-of-the-art residual neural networks on five representative intelligent edge platforms, Jetson TX2, Jetson Nano, Edge TPU, NCS, and NCS2, demonstrates the effectiveness of our approach, which can accelerate the inference of models on CIFAR-10 and ImageNet datasets.