论文标题
Adabin:使用自适应二进制组合改进二元神经网络
AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets
论文作者
论文摘要
本文研究了重量和激活都将二进制神经网络(BNN)二进为1位值,从而大大降低了内存使用率和计算复杂性。由于现代深层神经网络具有复杂的设计,具有复杂的架构,其准确性是为了精确的原因,因此权重和激活的分布多样性非常高。因此,常规符号函数不能很好地用于有效地将BNN中的全精度值进行二进制。为此,我们提出了一种称为adabin的简单而有效的方法,可以自适应获得最佳的二进制集$ \ {b_1,b_2 \} $($ b_1,b_1,b_2 \ in \ mathbb {r} $)的重量和激活的每一层的激活,而不是固定的集合(\ textit集)(\ textit {i.e。通过这种方式,提出的方法可以更好地拟合不同的分布,并提高二进制特征的表示能力。实际上,我们使用中心位置和1位值的距离来定义新的二进制量化函数。对于权重,我们提出了一种均衡方法,将对称分布的对称中心与实价分布相对,并最大程度地减少它们的kullback-leibler差异。同时,我们引入了一种基于梯度的优化方法,以获取以端到端方式共同训练的这两个激活参数。基准模型和数据集的实验结果表明,拟议的Adabin能够实现最新的性能。例如,我们使用RESNET-18体系结构在Imagenet上获得了66.4%的TOP-1精度,并使用SSD300获得了Pascal VOC的69.4地图。 The PyTorch code is available at \url{https://github.com/huawei-noah/Efficient-Computing/tree/master/BinaryNetworks/AdaBin} and the MindSpore code is available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/AdaBin}.
This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values, thus greatly reducing the memory usage and computational complexity. Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high. Therefore, the conventional sign function cannot be well used for effectively binarizing full-precision values in BNNs. To this end, we present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets $\{b_1, b_2\}$ ($b_1, b_2\in \mathbb{R}$) of weights and activations for each layer instead of a fixed set (\textit{i.e.}, $\{-1, +1\}$). In this way, the proposed method can better fit different distributions and increase the representation ability of binarized features. In practice, we use the center position and distance of 1-bit values to define a new binary quantization function. For the weights, we propose an equalization method to align the symmetrical center of binary distribution to real-valued distribution, and minimize the Kullback-Leibler divergence of them. Meanwhile, we introduce a gradient-based optimization method to get these two parameters for activations, which are jointly trained in an end-to-end manner. Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance. For instance, we obtain a 66.4% Top-1 accuracy on the ImageNet using ResNet-18 architecture, and a 69.4 mAP on PASCAL VOC using SSD300. The PyTorch code is available at \url{https://github.com/huawei-noah/Efficient-Computing/tree/master/BinaryNetworks/AdaBin} and the MindSpore code is available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/AdaBin}.