基于多核阵列的异质DNN加速器

论文标题

基于多核阵列的异质DNN加速器

Heterogeneous Multi-core Array-based DNN Accelerator

论文作者

Maleki, Mohammad Ali, Kamal, Mehdi, Afzali-Kusha, Ali

论文摘要

在本文中，我们研究了基于阵列的DNN加速器的架构参数对加速器在各种网络拓扑中的能耗和性能的影响。为此，我们开发了一种工具，该工具可以模拟基于数组的加速器上神经网络的执行，并具有测试不同配置以估计能耗和处理延迟的能力。基于我们对不同体系结构参数下基准网络的行为的分析，我们为具有高效但高性能加速器设计的建议提供了一些建议。接下来，我们提出了一种用于深神经网络执行的异质多核芯片方案。对选择性的小搜索空间的评估表明，神经网络在其近乎最佳的核心配置上的执行可以分别节省高达36％和67％的能源消耗和能量 - 否决产品。另外，我们建议一种算法来在相同类型的多个核心上分布网络层的处理，以通过模型并行性加速计算。对不同网络和不同核心数量的评估验证了所提出的算法在将处理加速到近乎最佳值时的有效性。

In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of energy consumption and processing latency. Based on our analysis of the behavior of benchmark networks under different architectural parameters, we offer a few recommendations for having an efficient yet high performance accelerator design. Next, we propose a heterogeneous multi-core chip scheme for deep neural network execution. The evaluations of a selective small search space indicate that the execution of neural networks on their near-optimal core configuration can save up to 36% and 67% of energy consumption and energy-delay product respectively. Also, we suggest an algorithm to distribute the processing of network's layers across multiple cores of the same type in order to speed up the computations through model parallelism. Evaluations on different networks and with the different number of cores verify the effectiveness of the proposed algorithm in speeding up the processing to near-optimal values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题