论文标题
目光与焦点:一种动态方法来减少图像分类中的空间冗余性
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
论文作者
论文摘要
深卷积神经网络(CNN)的准确性通常在用高分辨率图像加油时会提高。但是,这通常具有高计算成本和高内存足迹。受图像中并非所有区域都与任务相关的事实的启发,我们提出了一个新颖的框架,该框架通过处理一系列相对较小的输入来执行有效的图像分类,这些序列是从原始图像中策略性地从原始图像中选择的。这样的动态决策过程自然促进了测试时间的适应性推断,即,一旦模型对其预测充满信心,就可以终止它,从而避免了进一步的冗余计算。值得注意的是,我们的框架是一般且灵活的,因为它与大多数最先进的轻加权CNN兼容(例如Mobilenets,ExtricNets和Regnets),可以方便地将其作为骨干型提取器。 Imagenet上的实验表明,我们的方法始终提高了各种深层模型的计算效率。例如,它进一步降低了iPhone XS Max上高效的Mobilenet-V3的平均潜伏期,而无需牺牲准确性。代码和预培训模型可在https://github.com/blackfeather-wang/gfnet-pytorch上找到。
The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images. However, this often comes at a high computational cost and high memory footprint. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning. Such a dynamic decision process naturally facilitates adaptive inference at test time, i.e., it can be terminated once the model is sufficiently confident about its prediction and thus avoids further redundant computation. Notably, our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs (such as MobileNets, EfficientNets and RegNets), which can be conveniently deployed as the backbone feature extractor. Experiments on ImageNet show that our method consistently improves the computational efficiency of a wide variety of deep models. For example, it further reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 20% without sacrificing accuracy. Code and pre-trained models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.