论文标题
技术报告:部署模型的NEMO DNN量化
Technical Report: NEMO DNN Quantization for Deployment Model
论文作者
论文摘要
该技术报告旨在定义针对深神经网络(DNN)层次量化的正式框架,尤其是针对与最终部署有关的问题。它也充当了Nemo(Pytorch的神经最小化)框架的文档。它描述了Nemo中使用的四个DNN表示形式(FullPrecision,FulterPrecision,facequantized,venterizedDeployable和IntegerDeploy),尤其是针对后两个的正式定义。该模型的一个重要特征,尤其是可整数的表示表示,它可以使用纯整数来启用DNN推断 - 而无需诉诸于计算的任何部分,而无需依靠显式的固定点数值表示。
This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.