SPIQ：无数据的每通道静态输入量化

论文标题

SPIQ：无数据的每通道静态输入量化

SPIQ: Data-Free Per-Channel Static Input Quantization

论文作者

Yvinec, Edouard, Dapogny, Arnaud, Cord, Matthieu, Bailly, Kevin

论文摘要

计算昂贵的神经网络在计算机视觉中无处不在，用于有效推断的解决方案在机器学习社区中引起了人们的关注。这种解决方案的示例包括量化，即将处理值（权重和输入）从浮点转换为整数，例如INT8或INT4。同时，隐私的兴起涉及促使研究较小的侵入性加速方法的研究，例如预先训练的模型权重和激活的无数据量化。以前的方法要么利用统计信息来推断出静态激活的标量范围和缩放因子，要么动态地适应该范围的每个层的输入（也称为激活）：后者通常在明显较大的选择范围内更准确。在这项工作中，我们认为静态输入量化可以通过每通道输入量化方案达到动态方法的准确性水平，该方案允许人们更细节地保留跨通道动力学。我们通过对多个计算机视觉问题的彻底实证评估（例如，成像网格分类，Pascal VOC对象检测以及CityScaps语义细分）表明，所提出的方法称为SPIQ，可以实现与静态推理相媲美的动态方法，并以静态推理的速度超出了静态级别的速度，这显着超出了“ theart-art tehart量化”方法。

Computationally expensive neural networks are ubiquitous in computer vision and solutions for efficient inference have drawn a growing attention in the machine learning community. Examples of such solutions comprise quantization, i.e. converting the processing values (weights and inputs) from floating point into integers e.g. int8 or int4. Concurrently, the rise of privacy concerns motivated the study of less invasive acceleration methods, such as data-free quantization of pre-trained models weights and activations. Previous approaches either exploit statistical information to deduce scalar ranges and scaling factors for the activations in a static manner, or dynamically adapt this range on-the-fly for each input of each layers (also referred to as activations): the latter generally being more accurate at the expanse of significantly slower inference. In this work, we argue that static input quantization can reach the accuracy levels of dynamic methods by means of a per-channel input quantization scheme that allows one to more finely preserve cross-channel dynamics. We show through a thorough empirical evaluation on multiple computer vision problems (e.g. ImageNet classification, Pascal VOC object detection as well as CityScapes semantic segmentation) that the proposed method, dubbed SPIQ, achieves accuracies rivalling dynamic approaches with static-level inference speed, significantly outperforming state-of-the-art quantization methods on every benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题