论文标题
利用自动化的混合精液量化微小的微控制器
Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers
论文作者
论文摘要
严重的片上记忆限制目前正在防止在微型微控制器单元(MCUS)上部署最准确的深神经网络(DNN)模型,即使利用了有效的8位量化方案。为了解决这个问题,在本文中,我们根据HAQ框架提出了一个自动的混合精液量化流,但针对MCU设备的内存和计算特性量身定制。具体而言,在RAM和Flash嵌入式内存大小的紧密约束下,增强型学习代理在2、4、8位之间搜索了2、4、8位的最佳均匀量化水平。我们对MobilenetV1,MobilenetV2和MNASNET模型进行了实验分析,以分类。关于量化策略搜索,RL代理选择最大化内存利用率的量化策略。鉴于2MB的MCU级记忆用于仅重量量化,因此由混合精液发动机结果产生的压缩模型与使用不均匀功能量化的最新溶液一样准确,该溶液并未针对CPU量身定制,该溶液并未针对Integer-Oner-Onery Arithmetic量身定制。这表示MCU部署所需的均匀量化的生存能力,以进行深度压缩。当由于发现的量化策略而将激活记忆预算限制为512KB时,ImabilenetV1模型在Imagenet上的得分高达68.4%,从而比拟合相同内存约束的其他8位网络高4%。
The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints.