筑巢前向自动区分以进行记忆有效的深神经网络训练

论文标题

筑巢前向自动区分以进行记忆有效的深神经网络训练

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

论文作者

Guo, Cong, Qiu, Yuxian, Leng, Jingwen, Zhang, Chen, Cao, Ying, Zhang, Quanlu, Liu, Yunxin, Yang, Fan, Guo, Minyi

论文摘要

激活函数是元素的数学函数，在深神网络（DNN）中起着至关重要的作用。已经提出了许多新颖和复杂的激活功能来提高DNN的精度，但在训练过程中也会通过反向传播消耗大量记忆。在这项研究中，我们提出了嵌套的正向自动分化（正向AD），特别是用于用于记忆效率的DNN训练的元素激活函数。我们在两个广泛使用的深度学习框架（Tensorflow和Pytorch）中部署了嵌套的AD，分别支持静态和动态计算图。我们的评估表明，在相同的记忆降低率下，嵌套的前AD嵌套将使记忆足迹降低到1.97倍，比基线模型的表现优于20％。

An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory in the training process with back-propagation. In this study, we propose the nested forward automatic differentiation (Forward-AD), specifically for the element-wise activation function for memory-efficient DNN training. We deploy nested Forward-AD in two widely-used deep learning frameworks, TensorFlow and PyTorch, which support the static and dynamic computation graph, respectively. Our evaluation shows that nested Forward-AD reduces the memory footprint by up to 1.97x than the baseline model and outperforms the recomputation by 20% under the same memory reduction ratio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题