论文标题
一个无麦克斯神经推理处理器,支持压缩的,可变的精度重量
A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights
论文作者
论文摘要
本文介绍了两个用于推断卷积神经网络(CNN)的架构。两种体系结构都利用了重量稀疏性和压缩性,以降低计算复杂性和带宽。第一个体系结构使用乘以蓄能器(MAC),但通过跳过零重量来避免不必要的乘法。第二个体系结构通过用较小的位层乘积累加器(BLMAC)替换资源密集型MAC来利用其位表示的重量稀疏性。 BLMAC的使用还允许可变精度权重作为可变尺寸整数甚至浮点。给出了第二个体系结构实现的一些细节。还讨论了带有算术编码的重量压缩以及带宽的含义。最后,提出了一些探路者设计和各种技术的实施结果。
This paper introduces two architectures for the inference of convolutional neural networks (CNNs). Both architectures exploit weight sparsity and compression to reduce computational complexity and bandwidth. The first architecture uses multiply-accumulators (MACs) but avoids unnecessary multiplications by skipping zero weights. The second architecture exploits weight sparsity at the level of their bit representation by substituting resource-intensive MACs with much smaller Bit Layer Multiply Accumulators (BLMACs). The use of BLMACs also allows variable precision weights as variable size integers and even floating points. Some details of an implementation of the second architecture are given. Weight compression with arithmetic coding is also discussed as well as bandwidth implications. Finally, some implementation results for a pathfinder design and various technologies are presented.