脂肪：一个内存加速器，可快速增加三元重量神经网络

论文标题

脂肪：一个内存加速器，可快速增加三元重量神经网络

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

论文作者

Zhu, Shien, Duong, Luan H. K., Chen, Hui, Liu, Di, Liu, Weichen

论文摘要

卷积神经网络（CNN）在各种应用中表现出卓越的性能，但具有较高的计算复杂性。量化用于降低CNN的延迟和存储成本。在量化方法中，二进制重量网络（BWN和TWNS）在8位和4位量化方面具有独特的优势。他们用加法代替了CNN中的乘法操作，这些操作在内存计数（IMC）设备上受到青睐。 BWNS的IMC加速度已被广泛研究。但是，尽管TWN的精度比BWN具有更高的准确性和更好的稀疏性，但IMC加速度的TWN却有限。现有的IMC设备上的TWN效率低下，因为稀疏性不佳，并且加法操作不高。在本文中，我们建议FAT作为TWN的新型IMC加速器。首先，我们提出了一个稀疏的加法控制单元，该单元利用TWN的稀疏度跳过了零重量的无效操作。其次，我们提出了一个基于内存感知器的快速添加方案，以避免携带传播的时间开销并将其写回记忆单元。第三，我们进一步提出了一个组合的数据映射，以减少激活和权重的数据移动，并增加内存列之间的并行性。仿真结果表明，与最先进的IMC加速器Parapim相比，对于感官放大器水平上的加法操作，FAT达到2.00倍加速，1.22倍的功率效率和1.22倍面积效率。与帕拉皮姆（Parapim）相比，脂肪达到10.02倍加速度和12.19倍的能量效率，而平均稀疏性为80％的网络。

Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% average sparsity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题