CPT-V：视觉变压器训练后量化的一种对比方法

论文标题

CPT-V：视觉变压器训练后量化的一种对比方法

CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

论文作者

Frumkin, Natalia, Gope, Dibakar, Marculescu, Diana

论文摘要

在考虑训练后量化时，先前的工作通常专注于制定混合精度方案或学习分区量化网络的最佳方法。在我们的工作中，CPT-V，我们研究了一种一般的方法来提高已经量化的网络的准确性，仅通过扰动量化量表即可。借用自我监督学习的对比损失的想法，我们找到了一种可靠的方法，可以使用仅1,000个校准图像共同最大程度地减少损失函数。为了确定最佳性能量化量表，CPT-V以自我监督的方式与量化和完整精确模型的特征进行了对比。与传统的基于重建的损失功能不同，使用对比损失函数不仅奖励量化和完整的精度输出之间的相似性，而且有助于将量化的输出与给定批次内的其他输出区分开。此外，与先前的工作相反，CPT-V提出了一个稳定的进化搜索，以最大程度地减少全球对比度损失目标，从而可以准确改善现有视觉变压器（VIT）量化方案。例如，对于3位，4位和8位权重量化水平，CPT-V将完全量化的VIT基酶的TOP-1精度提高了10.30％，0.78％和0.15％。对各种VIT体系结构进行的广泛实验进一步证明了其在极端量化方案中的鲁棒性。我们的代码可在<link>上找到。

When considering post-training quantization, prior work has typically focused on developing a mixed precision scheme or learning the best way to partition a network for quantization. In our work, CPT-V, we look at a general way to improve the accuracy of networks that have already been quantized, simply by perturbing the quantization scales. Borrowing the idea of contrastive loss from self-supervised learning, we find a robust way to jointly minimize a loss function using just 1,000 calibration images. In order to determine the best performing quantization scale, CPT-V contrasts the features of quantized and full precision models in a self-supervised fashion. Unlike traditional reconstruction-based loss functions, the use of a contrastive loss function not only rewards similarity between the quantized and full precision outputs but also helps in distinguishing the quantized output from other outputs within a given batch. In addition, in contrast to prior works, CPT-V proposes a block-wise evolutionary search to minimize a global contrastive loss objective, allowing for accuracy improvement of existing vision transformer (ViT) quantization schemes. For example, CPT-V improves the top-1 accuracy of a fully quantized ViT-Base by 10.30%, 0.78%, and 0.15% for 3-bit, 4-bit, and 8-bit weight quantization levels. Extensive experiments on a variety of other ViT architectures further demonstrate its robustness in extreme quantization scenarios. Our code is available at <link>.

下载PDF全文

下载文献需遵守相关版权规定

论文标题