SDQ：具有混合精度的随机微分量化

论文标题

SDQ：具有混合精度的随机微分量化

SDQ: Stochastic Differentiable Quantization with Mixed Precision

论文作者

Huang, Xijie, Shen, Zhiqiang, Li, Shichao, Liu, Zechun, Hu, Xianghong, Wicaksana, Jeffry, Xing, Eric, Cheng, Kwang-Ting

论文摘要

为了以计算有效的方式部署深层模型，经常使用模型量化方法。此外，由于新的硬件支持混合的位算术操作，最近对混合精度量化（MPQ）的研究开始通过搜索网络中不同层和模块的优化位低宽，从而完全利用表示的能力。但是，先前的研究主要是在使用强化学习，神经体系结构搜索等的昂贵方案中搜索MPQ策略，或者简单地利用部分先验知识来进行位于刻度分配，这可能是有偏见和优势的。在这项工作中，我们提出了一种新颖的随机量化量化（SDQ）方法，该方法可以在更灵活且全球优化的空间中自动学习MPQ策略，并具有更轻松的梯度近似。特别是，可区分的位宽参数（DBP）被用作相邻位意选择之间随机量化的概率因素。在获取最佳MPQ策略之后，我们通过熵感知的bin正则化和知识蒸馏进一步训练网络。我们广泛评估了有关不同硬件（GPU和FPGA）和数据集的多个网络的方法。 SDQ的表现优于所有最先进的混合或单个精确量化，并且比位于较低的位置，甚至比各种重新网络和Mobilenet家族的全精度对应物更好，这表明了我们方法的有效性和优势。

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching optimized bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth assignment, which might be biased and sub-optimal. In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with smoother gradient approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are employed as the probability factors in stochastic quantization between adjacent bitwidth choices. After the optimal MPQ strategy is acquired, we further train our network with entropy-aware bin regularization and knowledge distillation. We extensively evaluate our method for several networks on different hardware (GPUs and FPGA) and datasets. SDQ outperforms all state-of-the-art mixed or single precision quantization with a lower bitwidth and is even better than the full-precision counterparts across various ResNet and MobileNet families, demonstrating the effectiveness and superiority of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题