使用复古合成数据的杂化和不均匀量化方法有效推断

论文标题

使用复古合成数据的杂化和不均匀量化方法有效推断

Hybrid and Non-Uniform quantization methods using retro synthesis data for efficient inference

论文作者

GVSL, Tej pratap, Kumar, Raja

论文摘要

现有的量化意识培训方法试图通过利用培训数据（例如大多数训练后量化方法）来弥补量化损失，并且也很耗时。这两种方法对于隐私约束应用程序都无效，因为它们与培训数据紧密相结合。相比之下，本文提出了一种独立于数据的培训后量化方案，以消除对培训数据的需求。这是通过从FP32模型层统计数据中生成一个人造数据集（以下称为复古合成数据）来实现的，并进一步使用它进行量化。这种方法的表现优于最先进的方法，包括但不限于具有和没有批次归一化层的模型上的ZeroQ和DFQ，均以8、6和4位精度在ImageNet和Cifar-10数据集上。我们还引入了两个未来派的训练后量化方法，即混合量化和不均匀的量化

Existing quantization aware training methods attempt to compensate for the quantization loss by leveraging on training data, like most of the post-training quantization methods, and are also time consuming. Both these methods are not effective for privacy constraint applications as they are tightly coupled with training data. In contrast, this paper proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset, hereafter referred to as Retro-Synthesis Data, from the FP32 model layer statistics and further using it for quantization. This approach outperformed state-of-the-art methods including, but not limited to, ZeroQ and DFQ on models with and without Batch-Normalization layers for 8, 6, and 4 bit precisions on ImageNet and CIFAR-10 datasets. We also introduced two futuristic variants of post-training quantization methods namely Hybrid Quantization and Non-Uniform Quantization

下载PDF全文

下载文献需遵守相关版权规定

论文标题