论文标题

Autoinit:通过Jacobian调整自动初始化

AutoInit: Automatic Initialization via Jacobian Tuning

论文作者

He, Tianyu, Doshi, Darshil, Gromov, Andrey

论文摘要

良好的初始化对于训练深层神经网络(DNN)至关重要。通常,通过试验和错误方法可以找到这种初始化,每当架构经过实质性修改或从较小尺寸的网络中继承而导致次优初始化时,必须重新应用。在这项工作中,我们介绍了一种新的便宜算法,该算法使人们可以自动找到一个良好的初始化,以供一般的前馈DNN。该算法利用相邻网络块之间的jacobian将网络超参数调整为关键性。我们解决了具有Relu的完全连接网络的算法的动力学,并得出了其收敛性的条件。然后,我们将讨论扩展到具有BatchNorm和剩余连接的更通用体系结构。最后,我们将方法应用于RESMLP和VGG体系结构,在该架构中,我们的方法发现的自动单次初始化在视觉任务上表现出良好的性能。

Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good initialization automatically, for general feed-forward DNNs. The algorithm utilizes the Jacobian between adjacent network blocks to tune the network hyperparameters to criticality. We solve the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. We then extend the discussion to more general architectures with BatchNorm and residual connections. Finally, we apply our method to ResMLP and VGG architectures, where the automatic one-shot initialization found by our method shows good performance on vision tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源