论文标题
通过主动注意操纵修剪
Pruning by Active Attention Manipulation
论文作者
论文摘要
CNN的过滤器修剪通常是通过在CNN的滤波器重量或激活图(训练后)上施加离散掩码来实现的。在这里,我们提出了一个新的滤镜分数评分概念,称为主动注意操纵(PAAM),在训练期间通过特定的注意机制通过特定的注意机制来剥夺CNN的过滤器。 PAAM通过优化由分数中的加法项正规化的成本函数来从滤波器的权重学习模拟滤波器得分。由于过滤器不是独立的,因此我们使用注意力来动态学习它们的相关性。此外,通过同时训练所有层的修剪得分,PAAM可以解释层相互依赖性,这对于找到性能的稀疏子网络至关重要。 PAAM还可以在直接的单阶段训练过程中从头开始训练并生成一个修剪的网络,而无需进行预训练的网络。最后,PAAM不需要特定图层特异性的超参数和预定义的层预算,因为它可以隐式确定每一层中适当的过滤器数量。我们对不同网络体系结构的实验结果表明,PAAM胜过最先进的结构化方法(SOTA)。在CIFAR-10数据集上,在不需要预训练的基线网络的情况下,我们分别获得1.02%和1.19%的精度增益和52.3%和54%的参数降低,分别在RESNET56和RESNET110上。同样,在Imagenet数据集上,PAAM在修剪RESNET50上的参数的51.1%时可实现1.06%的精度增长。对于CIFAR-10,这比SOTA分别为9.5%和6.6%的SOTA,并且在Imagenet上的利润率为11%。
Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores. As the filters are not independent, we use attention to dynamically learn their correlations. Moreover, by training the pruning scores of all layers simultaneously, PAAM can account for layer inter-dependencies, which is essential to finding a performant sparse sub-network. PAAM can also train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Finally, PAAM does not need layer-specific hyperparameters and pre-defined layer budgets, since it can implicitly determine the appropriate number of filters in each layer. Our experimental results on different network architectures suggest that PAAM outperforms state-of-the-art structured-pruning methods (SOTA). On CIFAR-10 dataset, without requiring a pre-trained baseline network, we obtain 1.02% and 1.19% accuracy gain and 52.3% and 54% parameters reduction, on ResNet56 and ResNet110, respectively. Similarly, on the ImageNet dataset, PAAM achieves 1.06% accuracy gain while pruning 51.1% of the parameters on ResNet50. For Cifar-10, this is better than the SOTA with a margin of 9.5% and 6.6%, respectively, and on ImageNet with a margin of 11%.