procrustes：稀疏深神经网络培训的数据流和加速器

论文标题

procrustes：稀疏深神经网络培训的数据流和加速器

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

论文作者

Yang, Dingqing, Ghasemazar, Amin, Ren, Xiaowei, Golub, Maximilian, Lemieux, Guy, Lis, Mieszko

论文摘要

DNN修剪的成功导致了能源有效的推理加速器的发展，这些推理加速器支持稀疏的重量和激活张量的修剪模型。但是，由于这些体系结构中的内存布局和数据流已针对$ \ Mathit {teber} $期间的访问模式进行了优化，但是，它们并不能有效地支持新兴的稀疏$ \ Mathit {triaght {triaght} $技术。在本文中，我们证明了（a）加速稀疏培训需要采用共同设计方法，以适应算法以适应硬件的限制，并且（b）用于稀疏DNN培训的硬件必须处理在推理加速器中不存在的约束。作为概念证明，我们适应了稀疏的培训算法，可以适应硬件加速度；然后，我们开发数据流，数据布局和负载平衡技术来加速它。最终的系统是一个稀疏的DNN训练加速器，该训练加速器具有与密集模型相同精度的修剪模型，而无需先训练，然后修剪，最后是重新训练，这是一个密集的模型。 Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$\times$ less energy and offers up to 4$\times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.

The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are optimized for the access patterns during $\mathit{inference}$, however, they do not efficiently support the emerging sparse $\mathit{training}$ techniques. In this paper, we demonstrate (a) that accelerating sparse training requires a co-design approach where algorithms are adapted to suit the constraints of hardware, and (b) that hardware for sparse DNN training must tackle constraints that do not arise in inference accelerators. As proof of concept, we adapt a sparse training algorithm to be amenable to hardware acceleration; we then develop dataflow, data layout, and load-balancing techniques to accelerate it. The resulting system is a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model. Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$\times$ less energy and offers up to 4$\times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题