论文标题
一种用于对象识别和分割的神经网络的泄露和压缩的定向进化方法
A Directed-Evolution Method for Sparsification and Compression of Neural Networks with Application to Object Identification and Segmentation and considerations of optimal quantization using small number of bits
论文作者
论文摘要
这项工作介绍了直接评估参数与网络准确性的相关性的定向进化方法(DE)方法,当时参数与网络准确性的相关性以及当暂时零时对准确性产生最小影响的参数确实被归零。 DE方法避免了通过模仿自然世界中的大型网络中的所有可能候选参数集的潜在组合爆炸。 DE使用蒸馏件[5]。在这种情况下,原始网络是教师,并将学生神经网络发展为稀疏目标,同时保持教师和学生之间的差异很小。 DE在网络的每一层达到所需的稀疏水平后,在存活参数上使用多种量化替代方案,以找到其表示的最低位数,并且可以接受准确性损失。提出了一个找到每个稀疏层中量化水平最佳分布的过程。最终参数表示,适用于幸存的量化参数的最终无损编码。使用MNIST,FashionMnist和可可数据集中使用DE用于具有渐进式较大网络的代表性神经网络样本。在可可数据集上训练的具有超过6000万个参数网络的80类Yolov3达到了90%的稀疏性,并正确识别和段,所有原始网络都使用4BIT参数量化的置信度超过80%的对象。 40倍至80倍的压缩。它尚未逃脱作者,可以嵌套来自不同方法的技术。一旦在DE的周期中确定了用于稀疏的最佳参数,就可以使用标准(如参数幅度和Hessian近似值)的组合对这些参数的一个子集做出决定。
This work introduces Directed-Evolution (DE) method for sparsification of neural networks, where the relevance of parameters to the network accuracy is directly assessed and the parameters that produce the least effect on accuracy when tentatively zeroed are indeed zeroed. DE method avoids a potentially combinatorial explosion of all possible candidate sets of parameters to be zeroed in large networks by mimicking evolution in the natural world. DE uses a distillation context [5]. In this context, the original network is the teacher and DE evolves the student neural network to the sparsification goal while maintaining minimal divergence between teacher and student. After the desired sparsification level is reached in each layer of the network by DE, a variety of quantization alternatives are used on the surviving parameters to find the lowest number of bits for their representation with acceptable loss of accuracy. A procedure to find optimal distribution of quantization levels in each sparsified layer is presented. Suitable final lossless encoding of the surviving quantized parameters is used for the final parameter representation. DE was used in sample of representative neural networks using MNIST, FashionMNIST and COCO data sets with progressive larger networks. An 80 classes YOLOv3 with more than 60 million parameters network trained on COCO dataset reached 90% sparsification and correctly identifies and segments all objects identified by the original network with more than 80% confidence using 4bit parameter quantization. Compression between 40x and 80x. It has not escaped the authors that techniques from different methods can be nested. Once the best parameter set for sparsification is identified in a cycle of DE, a decision on zeroing only a sub-set of those parameters can be made using a combination of criteria like parameter magnitude and Hessian approximations.