论文标题
RNNPOOL:RAM约束推断的有效非线性合并
RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
论文作者
论文摘要
为计算机视觉任务而设计的标准卷积神经网络(CNN)往往具有大型中间激活图。这些需要大量的工作记忆,因此不适合在通常用于推理边缘的资源约束设备上部署。通过合并或陷入困境的卷积来积极地减少图像可以解决该问题,但由于标准合并操作员对特征图的总汇总,准确性显着降低。在本文中,我们介绍了基于复发性神经网络(RNN)的新型合并操作员Rnnpool,该综合操作员有效地汇总了图像的大斑块上的特征,并迅速下调了激活图。经验评估表明,RNNPOOL层可以有效地替换各种体系结构(例如Mobilenets,Densenet)中的多个块,当应用于图像分类和面部检测等标准视觉任务时。也就是说,RNNPOOL可以显着降低推理的计算复杂性和峰值存储器使用情况,同时保持可比较的精度。我们使用标准S3FD体系结构的RNNPOOL来构建一种面部检测方法,该方法可为小型ARM Cortex-M4类微控制器提供最先进的地图,其RAM不到256 kb。代码在https://github.com/microsoft/edgeml上发布。
Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices typically used for inference on the edge. Aggressively downsampling the images via pooling or strided convolutions can address the problem but leads to a significant decrease in accuracy due to gross aggregation of the feature map by standard pooling operators. In this paper, we introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs), that efficiently aggregates features over large patches of an image and rapidly downsamples activation maps. Empirical evaluation indicates that an RNNPool layer can effectively replace multiple blocks in a variety of architectures such as MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection. That is, RNNPool can significantly decrease computational complexity and peak memory usage for inference while retaining comparable accuracy. We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM. Code is released at https://github.com/Microsoft/EdgeML.