关于神经网络光谱偏差的激活函数依赖性

论文标题

关于神经网络光谱偏差的激活函数依赖性

On the Activation Function Dependence of the Spectral Bias of Neural Networks

论文作者

Hong, Qingguo, Siegel, Jonathan W., Tan, Qinyang, Xu, Jinchao

论文摘要

神经网络是通用函数近似器，尽管过度参数过多，但已知可以很好地概括。我们从神经网络的光谱偏置的角度研究了这种现象。我们的贡献是两个方面。首先，我们通过利用与有限元方法的理论的联系来为Relu神经网络的光谱偏差提供理论解释。其次，基于该理论，我们预测将激活函数切换到分段线性B-spline（即HAT函数）将消除这种频谱偏置，我们在各种设置中进行经验验证。我们的经验研究还表明，使用随机梯度下降和ADAM对具有HAT激活功能的神经网络进行了更快的训练。结合以前的工作表明，HAT激活功能还提高了图像分类任务上的概括精度，这表明使用HAT激活在某些问题上提供了与RELU相比的显着优势。

Neural networks are universal function approximators which are known to generalize well despite being dramatically overparameterized. We study this phenomenon from the point of view of the spectral bias of neural networks. Our contributions are two-fold. First, we provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods. Second, based upon this theory we predict that switching the activation function to a piecewise linear B-spline, namely the Hat function, will remove this spectral bias, which we verify empirically in a variety of settings. Our empirical studies also show that neural networks with the Hat activation function are trained significantly faster using stochastic gradient descent and ADAM. Combined with previous work showing that the Hat activation function also improves generalization accuracy on image classification tasks, this indicates that using the Hat activation provides significant advantages over the ReLU on certain problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题