光子可重构加速器，可有效推断具有混合张量的CNN

论文标题

光子可重构加速器，可有效推断具有混合张量的CNN

Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors

论文作者

Vatsavai, Sairam Sri, Thakkar, Ishan G

论文摘要

已经证明，基于光子微孔谐振器（MRR）硬件加速器可为处理深卷积神经网络（CNN）提供破坏性的加速和能源效率的改进。但是，以前基于MRR的CNN加速器无法为具有混合张量的CNN提供有效的适应性。这种CNN的一个例子是可分离的CNN。在这种不灵活的加速器上对CNN进行CNN的推断通常会导致低硬件利用率，从而降低了加速器的可实现性能和能源效率。在本文中，我们提出了一种在基于MRR的CNN加速器中引入可重构性的新方法，以使加速器硬件组件与使用硬件组件处理的加速器硬件组件和CNN张量之间的尺寸兼容性进行动态最大化。我们根据加速器中使用的硬件组件的布局和相对位置将基于最新的MRR的CNN加速器分为两个类别。然后，我们使用我们的方法在这两个类别中引入加速器中的可重构性，从而改善其并行性，有效映射不同尺寸的张量，速度和整体能源效率的灵活性。我们根据该区域的前景（所有加速器的相等硬件区域）对可重构加速器进行了可重构加速器的评估。我们对四个现代CNN的推断的评估表明，与先前工作的基于MRR的基于MRR的加速器相比，我们设计的可重新配置CNN加速器可改善高达1.8倍，而FPS/W高达1.5倍。

Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently mapping tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题