论文标题
通过元学习对神经网络的自动低位混合量化
Automatic low-bit hybrid quantization of neural networks through meta learning
论文作者
论文摘要
模型量化是一种广泛使用的技术,可压缩和加速深度神经网络(DNN)推断,尤其是在部署到Edge或IoT设备有限的计算能力和功耗预算时。所有层的均匀位宽度量化通常是亚最佳选择,并且对不同层的杂交量化探索对于有效的深层压缩至关重要。在本文中,我们采用元学习方法自动实现神经网络的低位混合量化。元素网络以及量化函数都经过训练,以生成目标DNN的量化权重。然后,我们应用遗传算法来搜索满足压缩约束的最佳混合量化策略。借助最佳搜索量化策略,我们随后进行了重新培训或登录,以进一步提高量化目标网络的性能。广泛的实验表明,搜索的混合量化方案的性能超过了均匀的位宽度对应物。与依赖于乏味的探索的现有强化学习(RL)的混合量化搜索方法相比,我们的元学习方法对任何压缩要求都更加有效,因为元准则只需要一次培训一次。
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit width quantization across all the layers is usually sub-optimal and the exploration of hybrid quantization for different layers is vital for efficient deep compression. In this paper, we employ the meta learning method to automatically realize low-bit hybrid quantization of neural networks. A MetaQuantNet, together with a Quantization function, are trained to generate the quantized weights for the target DNN. Then, we apply a genetic algorithm to search the best hybrid quantization policy that meets compression constraints. With the best searched quantization policy, we subsequently retrain or finetune to further improve the performance of the quantized target network. Extensive experiments demonstrate the performance of searched hybrid quantization scheme surpass that of uniform bitwidth counterpart. Compared to the existing reinforcement learning (RL) based hybrid quantization search approach that relies on tedious explorations, our meta learning approach is more efficient and effective for any compression requirements since the MetaQuantNet only needs be trained once.