有效训练低壮天神经网络

论文标题

有效训练低壮天神经网络

Efficiently Training Low-Curvature Neural Networks

论文作者

Srinivas, Suraj, Matoba, Kyle, Lakkaraju, Himabindu, Fleuret, Francois

论文摘要

深度神经网络的高度非线性性质使它们容易受到对抗性例子的影响，并且具有不稳定的梯度，从而阻碍了可解释性。但是，解决这些问题的现有方法，例如对抗性训练，是昂贵的，并且通常牺牲了预测的准确性。在这项工作中，我们考虑曲率，这是编码非线性程度的数学数量。使用此功能，我们展示了低曲率的神经网络（LCNN），这些神经网络（LCNN）比标准模型的曲率大大低，同时表现出相似的预测性能，从而导致稳健性和稳定梯度，并且只有略有增加的训练时间。为了实现这一目标，我们最大程度地减少了与数据独立的上限在神经网络的曲率上，该曲率分解了其成分层的曲率和斜率方面的总体曲率。为了有效地最大程度地减少这种结合，我们介绍了两个新型的建筑组件：首先，是一种称为中心软pplus的非线性性，是软绒布非线性的稳定变体，其次是Lipschitz构成的批处理标准化层。我们的实验表明，与标准的高曲率对应物相比，LCNN具有较低的曲率，更稳定的梯度和增加现成的对抗性鲁棒性，而不会影响预测性能。我们的方法易于使用，并且可以很容易地将其纳入现有的神经网络模型中。

The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of non-linearity. Using this, we demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance, which leads to improved robustness and stable gradients, with only a marginally increased training time. To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers. To efficiently minimize this bound, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to their standard high-curvature counterparts, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题