论文标题
TXSIM:对电阻横梁系统上深神经网络的建模训练
TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems
论文作者
论文摘要
电阻性横杆引起了对深神经网络(DNN)加速器设计的重大兴趣,因为它们在密集的存储器阵列中本质上执行大规模平行的矢量矩阵乘法。但是,由于各种设备和电路级的非理想性,基于横杆的计算面临着重大挑战,这些设备和电路级的非理想性在矢量矩阵乘法中表现为误差,并最终降低了DNN精度。为了应对这一挑战,需要工具来模拟非理想性对DNN培训和推理的功能影响。实现这一目标的现有努力要么仅限于推论,要么太慢,无法用于大规模DNN培训。我们提出了TXSIM,这是一个快速且可自定义的建模框架,可在功能上评估基于横杆的硬件的DNN培训,考虑到非理想性的影响。 The key features of TxSim that differentiate it from prior efforts are: (i) It comprehensively models non-idealities during all training operations (forward propagation, backward propagation, and weight update) and (ii) it achieves computational efficiency by mapping crossbar evaluations to well-optimized BLAS routines and incorporates speedup techniques to further reduce simulation time with minimal impact on accuracy. TXSIM在先前的工作中实现了模拟速度的魔力提高,从而使评估横杆上大规模DNN的训练变得可行。我们使用TXSIM进行的实验表明,由于非理想性而导致的DNN训练中的准确性降解对于大型DNN而言可能是实质性的(3%-10%),这强调了需要进一步研究缓解技术的需求。我们还分析了各种设备和电路级参数的影响以及相关的非理想性,以提供可以指导基于横杆的DNN训练加速器设计的关键见解。
Resistive crossbars have attracted significant interest in the design of Deep Neural Network (DNN) accelerators due to their ability to natively execute massively parallel vector-matrix multiplications within dense memory arrays. However, crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities, which manifest as errors in the vector-matrix multiplications and eventually degrade DNN accuracy. To address this challenge, there is a need for tools that can model the functional impact of non-idealities on DNN training and inference. Existing efforts towards this goal are either limited to inference, or are too slow to be used for large-scale DNN training. We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware considering the impact of non-idealities. The key features of TxSim that differentiate it from prior efforts are: (i) It comprehensively models non-idealities during all training operations (forward propagation, backward propagation, and weight update) and (ii) it achieves computational efficiency by mapping crossbar evaluations to well-optimized BLAS routines and incorporates speedup techniques to further reduce simulation time with minimal impact on accuracy. TxSim achieves orders-of-magnitude improvement in simulation speed over prior works, and thereby makes it feasible to evaluate training of large-scale DNNs on crossbars. Our experiments using TxSim reveal that the accuracy degradation in DNN training due to non-idealities can be substantial (3%-10%) for large-scale DNNs, underscoring the need for further research in mitigation techniques. We also analyze the impact of various device and circuit-level parameters and the associated non-idealities to provide key insights that can guide the design of crossbar-based DNN training accelerators.