元原理级别缩放策略家族

论文标题

元原理级别缩放策略家族

Meta-Principled Family of Hyperparameter Scaling Strategies

论文作者

Yaida, Sho

论文摘要

在本说明中，我们首先得出了一个单参数缩放策略的单参数家族，该策略在神经区分尺度和平均场/最大升级缩放率之间进行了插值。然后，我们将动力学可观察物的量表（网络输出，神经切线内核和神经切线内核的差异）计算为宽和深的神经网络。这些计算反过来揭示了一种适当的宽度尺度缩放方法，以使最终的大规模模型保持其表示能力。最后，我们观察到文献中检查的各种无限宽度限制对应于有效宽度神经网络的有效理论所跨越的互连网络的不同角，其训练动力学从弱耦合到强烈耦合。

In this note, we first derive a one-parameter family of hyperparameter scaling strategies that interpolates between the neural-tangent scaling and mean-field/maximal-update scaling. We then calculate the scalings of dynamical observables -- network outputs, neural tangent kernels, and differentials of neural tangent kernels -- for wide and deep neural networks. These calculations in turn reveal a proper way to scale depth with width such that resultant large-scale models maintain their representation-learning ability. Finally, we observe that various infinite-width limits examined in the literature correspond to the distinct corners of the interconnected web spanned by effective theories for finite-width neural networks, with their training dynamics ranging from being weakly-coupled to being strongly-coupled.

下载PDF全文

下载文献需遵守相关版权规定

论文标题