论文标题
ArrayFlex:带有可配置透明管道的收缩阵列体系结构
ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
论文作者
论文摘要
卷积神经网络(CNN)是许多深度学习应用的最新解决方案。为了获得最大的可扩展性,它们的计算应结合高性能和能源效率。实际上,每个CNN层的卷积映射到矩阵乘法,该矩阵乘法包括每个层的所有输入特征和内核,并使用收缩期阵列计算。在这项工作中,我们专注于具有可配置管道的收缩阵列的设计,其目标是为每个CNN层选择最佳管道配置。所提出的称为ArrayFlex的收缩期阵列可以在正常或浅的管道模式下运行,从而平衡以循环和操作时钟频率的执行时间。通过选择每个CNN层的合适的管道配置,与传统的固定固定Pipeline收缩阵列相比,ArrayFlex平均将最新CNN的推理潜伏期降低了11%。最重要的是,对于相同的应用,使用功率降低13%-23%时,可以实现此结果,从而提供1.4倍至1.8倍之间的综合能量固定产品效率。
Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a traditional fixed-pipeline systolic array. Most importantly, this result is achieved while using 13%-23% less power, for the same applications, thus offering a combined energy-delay-product efficiency between 1.4x and 1.8x.