论文标题
任意位宽度网络:联合层次量化和自适应推理方法
Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach
论文作者
论文摘要
常规模型量化方法将固定的量化方案用于不同的数据样本,该方案忽略了各种样本之间固有的“识别难度”差异。我们建议用不同的量化方案为不同的数据样本提供不同的数据样本,以在细粒层级别上获得数据依赖的动态推断。但是,通过可变层的量化量化方案启用这种适应性推论是具有挑战性的,因为位宽度和层的组合呈指数增长,这使得在如此庞大的搜索空间中训练单个模型并在实践中使用它非常困难。为了解决此问题,我们介绍了任意的位宽度网络(ABN),其中单个深网的位宽度可以在运行时改变不同的数据样本,具有层面粒度。具体而言,首先,我们构建了一个可以通过多个位宽度分配的可定量“超级网络”的重量共享的“超级网络”,从而根据需求进行不同的量化。超级网络提供了大量的位宽度和层组合,每个层都可以在推理过程中使用,而无需重新培训或存储多种模型。其次,基于训练有素的超级网络,每个层的运行时位宽度选择决策被建模为马尔可夫决策过程(MDP),并通过适应性推理策略解决。实验表明,可以在没有准确降解的情况下构建超级网络,并且可以调整每一层的位宽度分配以处理即时的各种输入。在ImageNet分类中,我们可以提高1.1%的TOP1准确性,同时节省36.2%的比特。
Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent "recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.