训练后量化，以实现深神经网络的能源有效实现

论文标题

训练后量化，以实现深神经网络的能源有效实现

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

论文作者

Latotzke, Cecilia, Balim, Batuhan, Gemmeke, Tobias

论文摘要

接近边缘设备上生成的数据的深神经网络（DNN）的部署最大的挑战是它们的大小，即记忆足迹和计算复杂性。两者都通过量化显着降低。由于产生的较低的单词长度，DNN的能源效率会按比例增加。但是，较低的单词长度通常会导致准确性降解。为了抵消这种效果，对量化的DNN进行了重新训练。不幸的是，培训的成本比量化DNN的推断要高出5000倍。为了解决此问题，我们提出了一个训练后量化流，而无需重新培训。为此，我们研究了不同的量化选项。此外，我们的分析系统地评估了降低的权重和激活的影响，从而揭示了选择长度长度的清晰趋势。到目前为止，这两个方面尚未系统地研究。我们的结果与DNN的深度无关，并适用于均匀的量化，从而可以快速量化给定的预训练的DNN。对于ImageNet，我们以最新的6位优先于6位x 2.2％的前1位精度。在不进行重新培训的情况下，我们对8位的量化超过了浮点精度。

The biggest challenge for the deployment of Deep Neural Networks (DNNs) close to the generated data on edge devices is their size, i.e., memory footprint and computational complexity. Both are significantly reduced with quantization. With the resulting lower word-length, the energy efficiency of DNNs increases proportionally. However, lower word-length typically causes accuracy degradation. To counteract this effect, the quantized DNN is retrained. Unfortunately, training costs up to 5000x more energy than the inference of the quantized DNN. To address this issue, we propose a post-training quantization flow without the need for retraining. For this, we investigated different quantization options. Furthermore, our analysis systematically assesses the impact of reduced word-lengths of weights and activations revealing a clear trend for the choice of word-length. Both aspects have not been systematically investigated so far. Our results are independent of the depth of the DNNs and apply to uniform quantization, allowing fast quantization of a given pre-trained DNN. We excel state-of-the-art for 6 bit by 2.2% Top-1 accuracy for ImageNet. Without retraining, our quantization to 8 bit surpasses floating-point accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题