稀疏卷积尖峰神经网络的有效硬件加速

论文标题

稀疏卷积尖峰神经网络的有效硬件加速

Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks

论文作者

Sommer, Jan, Özkan, M. Akif, Keszocze, Oliver, Teich, Jürgen

论文摘要

尖峰神经网络（SNNS）在基于事件的物质中计算，以实现比标准神经网络更有效的计算。在SNN中，神经元输出（即激活）不是用实值激活而是用二进制尖峰序列编码的。在常规神经网络上使用SNN的动机植根于SNN的特殊计算方面，尤其是神经输出激活的极高程度。传统卷积神经网络（CNN）的建筑良好的体系结构具有大量的处理元件（PES），在面对激活稀少度的情况下仍然高度不足。我们提出了一种新的结构，该结构已针对具有高度激活稀疏性的卷积SNN（CSNN）的处理进行了优化。在我们的架构中，主要策略是使用较少但高度使用的PE。用于执行卷积的PE阵列仅与内核大小一样大，只要有尖峰即可处理，所有PE都可以活跃。通过将特征图（即激活）压缩为排队，可以通过尖峰处理尖峰来确保这种恒定的尖峰流。这种压缩是使用专用电路在运行时执行的，从而导致定时安排。这允许处理时间直接随尖峰数量扩展。一种称为内存交织的新型记忆组织方案用于使用多个小型平行片上的片上木板有效地存储和检索单个神经元的膜电位。每个RAM都与PE相连，从而减少了开关电路，并允许RAM与相应的PE近距离位置。与其他实施相比，我们在FPGA上实施了拟议的体系结构，并取得了显着的加速，同时需要更少的硬件资源并保持较低的能源消耗。

Spiking Neural Networks (SNNs) compute in an event-based matter to achieve a more efficient computation than standard Neural Networks. In SNNs, neuronal outputs (i.e. activations) are not encoded with real-valued activations but with sequences of binary spikes. The motivation of using SNNs over conventional neural networks is rooted in the special computational aspects of SNNs, especially the very high degree of sparsity of neural output activations. Well established architectures for conventional Convolutional Neural Networks (CNNs) feature large spatial arrays of Processing Elements (PEs) that remain highly underutilized in the face of activation sparsity. We propose a novel architecture that is optimized for the processing of Convolutional SNNs (CSNNs) that feature a high degree of activation sparsity. In our architecture, the main strategy is to use less but highly utilized PEs. The PE array used to perform the convolution is only as large as the kernel size, allowing all PEs to be active as long as there are spikes to process. This constant flow of spikes is ensured by compressing the feature maps (i.e. the activations) into queues that can then be processed spike by spike. This compression is performed in run-time using dedicated circuitry, leading to a self-timed scheduling. This allows the processing time to scale directly with the number of spikes. A novel memory organization scheme called memory interlacing is used to efficiently store and retrieve the membrane potentials of the individual neurons using multiple small parallel on-chip RAMs. Each RAM is hardwired to its PE, reducing switching circuitry and allowing RAMs to be located in close proximity to the respective PE. We implemented the proposed architecture on an FPGA and achieved a significant speedup compared to other implementations while needing less hardware resources and maintaining a lower energy consumption.

下载PDF全文

下载文献需遵守相关版权规定

论文标题