论文标题
DNN推断受约束边缘节点的渠道混合精液分配
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
论文作者
论文摘要
量化广泛用于云和边缘系统,以减少深层神经网络的记忆占用,潜伏期和能耗。特别是,混合过度量化的量化,即,在网络的不同部分中使用不同的位宽度,可提供有限准确性下降的效率提高,尤其是通过由自动化的神经体系结构搜索(NAS)工具确定的优化的位宽度分配。最先进的混合精液在层面上,即,它对每个网络层的权重和激活张量使用不同的位宽度。在这项工作中,我们扩大了搜索空间,提出了一种新颖的NA,该NAS独立选择每个重量张量通道的位宽度。这为工具提供了额外的灵活性,即仅针对与最有用的功能相关的权重分配更高的精度。在MLPERF微小的基准套件上进行测试,我们在精确度大小与精度与能量空间中获得了丰富的帕累托最佳模型集合。当部署在MPIC RISC-V边缘处理器上时,我们的网络将记忆和能量分别减少了高达63%和27%,而层次的方法则相同。
Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.