CSMPQ：基于类可分离性的混合精确量化

论文标题

CSMPQ：基于类可分离性的混合精确量化

CSMPQ:Class Separability Based Mixed-Precision Quantization

论文作者

Wang, Mingkai, Jin, Taisong, Zhang, Miaohui, Yu, Zhengtao

论文摘要

混合精确的量化因其减少计算负担和加快推理时间的能力而受到了越来越多的关注。现有方法通常集中于不同网络层的灵敏度，这需要耗时的搜索或培训过程。为此，提出了一种新型的混合精确量化方法，称为CSMPQ。具体而言，引入了在自然语言处理（NLP）中广泛使用的TF-IDF度量标准，以测量图层特征图的类可分离性。此外，线性编程问题旨在得出每一层的最佳位配置。没有任何迭代过程，所提出的CSMPQ比最先进的量化方法实现了更好的压缩权衡。具体而言，CSMPQ在Resnet-18上实现73.03 $ \％$ top-1 ACC，仅使用59克BOPS，而PTQ的MobilenEtV2仅1.5mb，而PTQ仅1.5mb，而QAT仅为59G BOPS。

Mixed-precision quantization has received increasing attention for its capability of reducing the computational burden and speeding up the inference time. Existing methods usually focus on the sensitivity of different network layers, which requires a time-consuming search or training process. To this end, a novel mixed-precision quantization method, termed CSMPQ, is proposed. Specifically, the TF-IDF metric that is widely used in natural language processing (NLP) is introduced to measure the class separability of layer-wise feature maps. Furthermore, a linear programming problem is designed to derive the optimal bit configuration for each layer. Without any iterative process, the proposed CSMPQ achieves better compression trade-offs than the state-of-the-art quantization methods. Specifically, CSMPQ achieves 73.03$\%$ Top-1 acc on ResNet-18 with only 59G BOPs for QAT, and 71.30$\%$ top-1 acc with only 1.5Mb on MobileNetV2 for PTQ.

下载PDF全文

下载文献需遵守相关版权规定

论文标题