技术报告：部署模型的NEMO DNN量化

论文标题

技术报告：部署模型的NEMO DNN量化

Technical Report: NEMO DNN Quantization for Deployment Model

论文作者

Conti, Francesco

论文摘要

该技术报告旨在定义针对深神经网络（DNN）层次量化的正式框架，尤其是针对与最终部署有关的问题。它也充当了Nemo（Pytorch的神经最小化）框架的文档。它描述了Nemo中使用的四个DNN表示形式（FullPrecision，FulterPrecision，facequantized，venterizedDeployable和IntegerDeploy），尤其是针对后两个的正式定义。该模型的一个重要特征，尤其是可整数的表示表示，它可以使用纯整数来启用DNN推断 - 而无需诉诸于计算的任何部分，而无需依靠显式的固定点数值表示。

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题