descnet：为胶囊网络硬件开发有效的刮擦记忆

论文标题

descnet：为胶囊网络硬件开发有效的刮擦记忆

DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware

论文作者

Marchisio, Alberto, Mrazek, Vojtech, Hanif, Muhammad Abdullah, Shafique, Muhammad

论文摘要

深度神经网络（DNN）已被确定为高级机器学习应用程序的最新算法。胶囊网络（CAPSNETS）最近提出的是，与DNN相比，胶囊网络（CAPSNETS）由于其多维胶囊并保留了不同对象之间的空间关系，因此提高了概括能力。但是，它们提出了明显高的计算和记忆要求，这使得能节省的推论成为一项艰巨的任务。本文首次提供了深入的分析，以突出显示在硬件加速器中执行快速CAPSNET推理的（芯片）内存的设计和管理相关的挑战。为了启用有效的设计，我们提出了一个特定于应用程序的内存层次结构，该内存层次结构可最大程度地减少芯片内存访问，同时有效地将数据馈送到硬件加速器。我们分析了相应的芯片内存要求，并利用它提出了一种新颖的方法来探索不同的刮擦记忆设计及其能量/区域折衷。之后，提出了一种针对特定应用的电力门控技术，以进一步降低能源消耗，具体取决于封顶的不同操作的利用。与执行MNIST数据集的Google Capsnet模型相比，我们对选定的帕累托最佳解决方案的结果表明，整个加速器（包括计算单元和记忆）的能量损失和能量降低79％，包括计算单元和记忆。

Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently proposed by the Google Brain's team, the Capsule Networks (CapsNets) have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules and preserving the spatial relationship between different objects. However, they pose significantly high computational and memory requirements, making their energy-efficient inference a challenging task. This paper provides, for the first time, an in-depth analysis to highlight the design and management related challenges for the (on-chip) memories deployed in hardware accelerators executing fast CapsNets inference. To enable an efficient design, we propose an application-specific memory hierarchy, which minimizes the off-chip memory accesses, while efficiently feeding the data to the hardware accelerator. We analyze the corresponding on-chip memory requirements and leverage it to propose a novel methodology to explore different scratchpad memory designs and their energy/area trade-offs. Afterwards, an application-specific power-gating technique is proposed to further reduce the energy consumption, depending upon the utilization across different operations of the CapsNets. Our results for a selected Pareto-optimal solution demonstrate no performance loss and an energy reduction of 79% for the complete accelerator, including computational units and memories, when compared to a state-of-the-art design executing Google's CapsNet model for the MNIST dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题