使颗粒状浓缩：朝着结构稀疏的彩票票

论文标题

使颗粒状浓缩：朝着结构稀疏的彩票票

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

论文作者

Chen, Tianlong, Chen, Xuxi, Ma, Xiaolong, Wang, Yanzhi, Wang, Zhangyang

论文摘要

彩票票证假设（LTH）表明，密集的模型包含高度稀疏的子网（即获胜的门票），可以隔离培训以匹配完全准确性。尽管做出了许多激动人心的努力，但很少有一个“常识”挑战：迭代级修剪（IMP）发现了一张获胜的票，因此，由此产生的修剪子网仅具有非结构化的稀疏性。这一差距限制了在实践中赢得门票的吸引力，因为高度不规则的稀疏模式在硬件上加速的挑战是挑战性的。同时，将结构化修剪直接替换为非结构化的修剪损害赔偿绩效，通常无法找到获胜的门票。在本文中，我们证明了第一个积极的结果是，总体上可以有效地找到结构上稀疏的获胜票。核心思想是在每轮（非结构化）IMP之后附加“后处理技术”，以实施结构稀疏的形成。具体而言，我们首先在某些被认为很重要的通道中“重新填充”修剪元素，然后“重新组”非零元素以创建灵活的群体结构模式。我们确定的渠道和团体结构子网都赢得了彩票，现有硬件很容易支持彩票。广泛的实验，在多个网络主干的各种数据集上进行，始终验证我们的建议，表明LTH的硬件加速障碍现在已删除。具体而言，结构上的获胜门票最多可获得{64.93％，64.84％，60.23％}的运行时间节省，以{36％〜80％，74％，58％}的稀疏性在{Cifar，cifar，tiny-imageNet，imageNet}上的稀疏性，同时保持准确的准确性。代码在https://github.com/vita-group/structure-lth上。

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. Despite many exciting efforts being made, there is one "commonsense" rarely challenged: a winning ticket is found by iterative magnitude pruning (IMP) and hence the resultant pruned subnetworks have only unstructured sparsity. That gap limits the appeal of winning tickets in practice, since the highly irregular sparse patterns are challenging to accelerate on hardware. Meanwhile, directly substituting structured pruning for unstructured pruning in IMP damages performance more severely and is usually unable to locate winning tickets. In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. The core idea is to append "post-processing techniques" after each round of (unstructured) IMP, to enforce the formation of structural sparsity. Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns. Both our identified channel- and group-wise structural subnetworks win the lottery, with substantial inference speedups readily supported by existing hardware. Extensive experiments, conducted on diverse datasets across multiple network backbones, consistently validate our proposal, showing that the hardware acceleration roadblock of LTH is now removed. Specifically, the structural winning tickets obtain up to {64.93%, 64.84%, 60.23%} running time savings at {36%~80%, 74%, 58%} sparsity on {CIFAR, Tiny-ImageNet, ImageNet}, while maintaining comparable accuracy. Code is at https://github.com/VITA-Group/Structure-LTH.

下载PDF全文

下载文献需遵守相关版权规定

论文标题