完全嵌入像素处理器阵列上的快速卷积网络

论文标题

完全嵌入像素处理器阵列上的快速卷积网络

Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays

论文作者

Bose, Laurie, Chen, Jianing, Carey, Stephen J., Dudek, Piotr, Mayol-Cuevas, Walterio

论文摘要

我们提出了一种新颖的CNN推断Pixel处理器阵列（PPA）视觉传感器的方法，该传感器旨在利用它们的大量并行性和模拟计算能力。 PPA传感器由一系列处理元件（PES）组成，每个PE都能进行光捕获，数据存储和计算，从而可以直接在传感器设备上执行各种计算机视觉处理。我们方法背后的关键思想是将网络权重存储在PPA传感器本身的PES中，以允许并行进行各种计算，例如多个不同的图像卷积。我们的方法可以执行完全在PPA传感器上的卷积层，最大池，relu和最终完全连接的层，同时却没有未开发的计算资源。这与以前仅使用传感器级处理来依次计算图像卷积的作品相反，并且必须将数据传输到外部数字处理器以完成计算。我们在SCAMP-5视觉系统上演示了我们的方法，以每秒超过3000帧的速度和93％的分类精度来推断MNIST数字分类网络。这是证明CNN推断完全在PPA视觉传感器设备的处理器阵列上进行的第一项工作，不需要外部处理。

We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities. PPA sensors consist of an array of processing elements (PEs), with each PE capable of light capture, data storage and computation, allowing various computer vision processing to be executed directly upon the sensor device. The key idea behind our approach is storing network weights "in-pixel" within the PEs of the PPA sensor itself to allow various computations, such as multiple different image convolutions, to be carried out in parallel. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor, while leaving no untapped computational resources. This is in contrast to previous works that only use a sensor-level processing to sequentially compute image convolutions, and must transfer data to an external digital processor to complete the computation. We demonstrate our approach on the SCAMP-5 vision system, performing inference of a MNIST digit classification network at over 3000 frames per second and over 93% classification accuracy. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题