有效的两层relu网络的全球优化：二次时算法和对抗性训练

论文标题

有效的两层relu网络的全球优化：二次时算法和对抗性训练

Efficient Global Optimization of Two-Layer ReLU Networks: Quadratic-Time Algorithms and Adversarial Training

论文作者

Bai, Yatong, Gautam, Tanmay, Sojoudi, Somayeh

论文摘要

人工神经网络（ANN）训练景观的非跨性别带来了固有的优化困难。尽管传统的后传播随机梯度下降（SGD）算法及其变体在某些情况下是有效的，但它们可能会陷入虚假的局部最小值，并且对初始化和超参数敏感。最近的工作表明，可以重新培训具有RELU激活的ANN作为凸面计划，从而为全球优化可解释的ANN带来希望。但是，天真地求解凸训练公式的呈指数复杂性，即使是近似启发式的启发式也需要立方时间。在这项工作中，我们表征了这种近似值的质量，并开发了两种有效的算法，这些算法可以通过全球融合保证训练ANN。第一种算法基于乘数的交替方向方法（ADMM）。它既求解精确的凸公式和近似对应物。实现线性全局收敛，最初的几次迭代通常会产生具有高预测精度的解决方案。在求解近似公式时，每卷时间复杂性是二次的。第二种算法基于“采样凸程序”理论，求解了无约束的凸公式，并收敛到近似全球最佳分类器。当考虑对抗训练时，ANN训练景观的非跨性别加剧。我们将强大的凸优化理论应用于凸训练并开发凸式制剂，以训练ANN对对抗性输入。我们的分析明确地集中在一个隐藏的层面完全连接的ANN上，但可以扩展到更复杂的体系结构。

The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multiplier (ADMM). It solves both the exact convex formulation and the approximate counterpart. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the "sampled convex programs" theory, solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The non-convexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题