论文标题

$^3 $:具有近似值的神经网络中的注意力机制

A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation

论文作者

Ham, Tae Jun, Jung, Sung Jun, Kim, Seonghak, Oh, Young H., Park, Yeonhong, Song, Yoonho, Park, Jung-Hun, Lee, Sanghee, Park, Kyoung, Lee, Jae W., Jeong, Deog-Kyoon

论文摘要

随着神经网络的计算需求不断增长,已经提出了许多针对神经网络的硬件加速器。这种现有的神经网络加速器通常专注于流行的神经网络类型,例如卷积神经网络(CNN)和经常性神经网络(RNNS);但是,对注意机制的关注不多,这是一种新兴的神经网络原始性,使神经网络能够从知识库,外部记忆或过去的状态中检索最相关的信息。许多最新的神经网络广泛采用了注意力机制,用于计算机视觉,自然语言处理和机器翻译,并占总执行时间的很大一部分。我们观察到当今使用矩阵矢量乘法实施这种机制的实践是次优的,因为注意机制在语义上是基于内容的搜索,其中大部分计算最终未使用。基于此观察结果,我们设计和架构师A3,它加速了具有算法近似和硬件专业化的神经网络中的注意机制。我们提出的加速器可实现能源效率(性能/瓦特)的多个数量级,并在最新的常规硬件方面达到了大幅度的加速。

With the increasing computational demands of neural networks, many hardware accelerators for the neural networks have been proposed. Such existing neural network accelerators often focus on popular neural network types such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs); however, not much attention has been paid to attention mechanisms, an emerging neural network primitive that enables neural networks to retrieve most relevant information from a knowledge-base, external memory, or past states. The attention mechanism is widely adopted by many state-of-the-art neural networks for computer vision, natural language processing, and machine translation, and accounts for a large portion of total execution time. We observe today's practice of implementing this mechanism using matrix-vector multiplication is suboptimal as the attention mechanism is semantically a content-based search where a large portion of computations ends up not being used. Based on this observation, we design and architect A3, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源