论文标题
研究仅梯度线搜索与不同激活功能之间的相互作用
Investigating the interaction between gradient-only line searches and different activation functions
论文作者
论文摘要
仅梯度线搜索(GOLS)沿搜索方向自适应地确定阶跃大小,以确定神经网络训练中动态迷你批次子采样引起的不连续损耗功能。 GOL中的步骤大小是通过将随机非负相关梯度投影点(SNN-GPP)沿下降方向定位的。这些是通过从下降方向从负到正的定向导数的符号变化来识别的。激活函数是神经网络体系结构的重要组成部分,因为它们引入了复杂函数近似必不可少的非线性。激活函数的平滑度和连续性特征直接影响要优化损耗函数的梯度特征。因此,在GOL的背景下研究激活函数与不同神经网络体系结构之间的关系是有趣的。我们发现,GOL对于一系列激活函数是可靠的,但对标准前馈架构中的整流线性单元(RELU)激活函数敏感。 Relu的负输入域中的零衍生可能会导致梯度向量变得稀疏,从而严重影响训练。我们表明,实施诸如批处理标准化和跳过连接之类的建筑特征可以减轻这些困难,并通过GOL对所有激活功能的培训受益。
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions resulting from dynamic mini-batch sub-sampling in neural network training. Step sizes in GOLS are determined by localizing Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs) along descent directions. These are identified by a sign change in the directional derivative from negative to positive along a descent direction. Activation functions are a significant component of neural network architectures as they introduce non-linearities essential for complex function approximations. The smoothness and continuity characteristics of the activation functions directly affect the gradient characteristics of the loss function to be optimized. Therefore, it is of interest to investigate the relationship between activation functions and different neural network architectures in the context of GOLS. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures. The zero-derivative in ReLU's negative input domain can lead to the gradient-vector becoming sparse, which severely affects training. We show that implementing architectural features such as batch normalization and skip connections can alleviate these difficulties and benefit training with GOLS for all activation functions considered.