论文标题

深层神经网络的过滤移植:原因,方法和培养

Filter Grafting for Deep Neural Networks: Reason, Method, and Cultivation

论文作者

Cheng, Hao, Meng, Fanxu, Li, Ke, Gao, Yuting, Lu, Guangming, Sun, Xing, Ji, Rongrong

论文摘要

滤波器是现代卷积神经网络(CNN)中的关键组成部分。但是,由于CNN通常被过度参数化,因此预训练的网络总是包含一些无效的(不重要的)过滤器。这些过滤器具有相对较小的$ l_ {1} $ norm,对输出(\ textbf {quach})几乎没有贡献。在过滤器修剪会去除这些无效的过滤器以考虑效率时,我们倾向于重新激活它们以提高CNN的表示能力。在本文中,我们介绍过滤器嫁接(\ textbf {method})以实现此目标。通过将外部信息(权重)接种到无效的过滤器中来处理激活。为了更好地执行嫁接,我们开发了一个新的标准,以测量过滤器的信息和适应性加权策略,以平衡网络之间的移植信息。接枝操作后,与初始状态相比,网络的无效过滤器较少,从而使模型具有更大的表示能力。同时,由于嫁接在所涉及的所有网络上都相互操作,因此我们发现嫁接可能会在改善无效过滤器时失去有效过滤器的信息。为了在有效和无效的过滤器上获得普遍的改进,我们通过蒸馏(\ textbf {cruption})补偿嫁接,以克服嫁接的缺点。对分类和识别任务进行了广泛的实验,以显示我们方法的优越性。代码可在\ textColor {black} {\ emph {https://github.com/fxmeng/filter-grafting}}}中获得。

Filter is the key component in modern convolutional neural networks (CNNs). However, since CNNs are usually over-parameterized, a pre-trained network always contain some invalid (unimportant) filters. These filters have relatively small $l_{1}$ norm and contribute little to the output (\textbf{Reason}). While filter pruning removes these invalid filters for efficiency consideration, we tend to reactivate them to improve the representation capability of CNNs. In this paper, we introduce filter grafting (\textbf{Method}) to achieve this goal. The activation is processed by grafting external information (weights) into invalid filters. To better perform the grafting, we develop a novel criterion to measure the information of filters and an adaptive weighting strategy to balance the grafted information among networks. After the grafting operation, the network has fewer invalid filters compared with its initial state, enpowering the model with more representation capacity. Meanwhile, since grafting is operated reciprocally on all networks involved, we find that grafting may lose the information of valid filters when improving invalid filters. To gain a universal improvement on both valid and invalid filters, we compensate grafting with distillation (\textbf{Cultivation}) to overcome the drawback of grafting . Extensive experiments are performed on the classification and recognition tasks to show the superiority of our method. Code is available at \textcolor{black}{\emph{https://github.com/fxmeng/filter-grafting}}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源