学习的硬件/软件共同设计神经加速器

论文标题

学习的硬件/软件共同设计神经加速器

Learned Hardware/Software Co-Design of Neural Accelerators

论文作者

Shi, Zhan, Sakhuja, Chirag, Hashemi, Milad, Swersky, Kevin, Lin, Calvin

论文摘要

深度学习的使用以指数级的速度增长，从而产生了许多专门的硬件和软件系统，以进行深度学习。因为深度学习软件的设计空间堆栈和硬件加速器是多种多样且庞大的，因此先前的工作将软件优化与硬件架构分开，从而有效地降低了搜索空间。不幸的是，这种分叉的方法意味着从未探索过许多盈利的设计点。相反，本文将问题视为硬件/软件共同设计，其目的是自动识别关节设计空间中的理想点。解决方案的关键是一个新的受约束贝叶斯优化框架，该框架通过利用该设计空间的高度约束特征，即半连续/半discrete来避免解决方案。我们通过将其应用于各种神经模型来评估我们的优化框架，将能量延迟的产品提高18％（RESNET）和40％（DQN），而在手动调整的最先进系统上，并在其他神经网络架构（例如MLP和变形金刚）上证明了良好的结果。

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse and vast, prior work considers software optimizations separately from hardware architectures, effectively reducing the search space. Unfortunately, this bifurcated approach means that many profitable design points are never explored. This paper instead casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space. The key to our solution is a new constrained Bayesian optimization framework that avoids invalid solutions by exploiting the highly constrained features of this design space, which are semi-continuous/semi-discrete. We evaluate our optimization framework by applying it to a variety of neural models, improving the energy-delay product by 18% (ResNet) and 40% (DQN) over hand-tuned state-of-the-art systems, as well as demonstrating strong results on other neural network architectures, such as MLPs and Transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题