了解为什么神经网络通过参数的GSNR很好地概括

论文标题

了解为什么神经网络通过参数的GSNR很好地概括

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

论文作者

Liu, Jinlong, Jiang, Guoqing, Bai, Yunzhi, Chen, Ting, Wang, Huayan

论文摘要

随着深度神经网络（DNNS）在许多应用领域取得了巨大的成功，研究人员试图在许多方面探索它们为什么概括。在本文中，我们使用DNN训练过程中参数的梯度信号与噪声比（GSNR）进行了新的观点。参数的GSNR定义为与数据分布相对于梯度的平方平方和方差之间的比率。基于几个近似值，我们在模型参数的GSNR和概括差距之间建立了定量关系。这种关系表明，训练过程中较大的GSNR会导致更好的概括性能。此外，我们表明，与浅层模型（例如逻辑回归，支持向量机）不同，DNN的梯度下降优化动力学自然会在训练过程中产生大型GSNR，这可能是DNNS出色概括能力的关键。

As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well. In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance, over the data distribution. Based on several approximations, we establish a quantitative relationship between model parameters' GSNR and the generalization gap. This relationship indicates that larger GSNR during training process leads to better generalization performance. Moreover, we show that, different from that of shallow models (e.g. logistic regression, support vector machines), the gradient descent optimization dynamics of DNNs naturally produces large GSNR during training, which is probably the key to DNNs' remarkable generalization ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题