对深神经网络的归一化影响

论文标题

对深神经网络的归一化影响

Normalization effects on deep neural networks

论文作者

Yu, Jiahui, Spiliopoulos, Konstantinos

论文摘要

我们研究了归一化对馈送前馈类型深神经网络层的影响。 A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on MNIST数据集。我们发现，就神经网络的输出和测试准确性的差异而言，最佳选择是选择$γ_{i} $等于一个，这是平均场缩放。我们还发现，对于外层，尤其如此，因为神经网络的行为在外层的缩放层中更敏感，而不是内层的缩放。数学分析的机制是神经网络输出的渐近扩展。分析的一个重要实际结果是，它提供了一种系统性和数学知情的方式来选择学习率超级参数。这样的选择可以保证，随着$ n_i $的增长，神经网络的行为在统计上具有稳健的方式。

We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $γ_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题