论文标题

技术报告:部署模型的NEMO DNN量化

Technical Report: NEMO DNN Quantization for Deployment Model

论文作者

Conti, Francesco

论文摘要

该技术报告旨在定义针对深神经网络(DNN)层次量化的正式框架,尤其是针对与最终部署有关的问题。它也充当了Nemo(Pytorch的神经最小化)框架的文档。它描述了Nemo中使用的四个DNN表示形式(FullPrecision,FulterPrecision,facequantized,venterizedDeployable和IntegerDeploy),尤其是针对后两个的正式定义。该模型的一个重要特征,尤其是可整数的表示表示,它可以使用纯整数来启用DNN推断 - 而无需诉诸于计算的任何部分,而无需依靠显式的固定点数值表示。

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源