论文标题
使用神经元级模糊回忆方案保存RNN计算
Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme
论文作者
论文摘要
复发性神经网络(RNN)是用于自动语音识别或机器翻译等应用的关键技术。与常规的饲喂前脱离DNN不同,RNN记得过去的信息以提高未来预测的准确性,因此,它们对于序列处理问题非常有效。对于每个应用程序运行,经过多次执行复发层来处理潜在的大量输入序列(单词,图像,音频帧等)。在本文中,我们观察到,神经元的输出在连续调用中表现出很小的变化。 该方案中的主要挑战是确定序列中新的神经元输出是否类似于最近计算的结果。为此,我们使用更简单的位神经网络(BNN)扩展了复发层,并表明BNN和RNN输出高度相关:如果两个BNN输出非常相似,则原始RNN层中的相应输出可能会显示出可忽略的变化。 BNN提供了一种低成本和有效的机制,用于确定何时可以应用模糊回忆,对精度的影响很小。我们在RNN的最先进的加速器上评估了我们的回忆计划,用于来自多个应用程序域的各种不同神经网络。我们表明,我们的技术避免了超过26.7 \%的计算,因此平均节省了21 \%的能量和1.4倍的速度。
Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, recurrent layers are executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we observe that the output of a neuron exhibits small changes in consecutive invocations.~We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 26.7\% of computations, resulting in 21\% energy savings and 1.4x speedup on average.