学习使用梯度下降初始化梯度下降

论文标题

学习使用梯度下降初始化梯度下降

Learning to Initialize Gradient Descent Using Gradient Descent

论文作者

Ahuja, Kartik, Dhurandhar, Amit, Varshney, Kush R.

论文摘要

非凸优化问题的解决方案具有挑战性。梯度下降算法或变体的成功和计算费用在很大程度上取决于初始化策略。通常，使用随机初始化或初始化规则是通过利用问题类的性质仔细设计的。作为手工制作的初始化规则的简单替代方法，我们提出了一种从以前解决方案中学习“良好”初始化规则的方法。我们提供了理论保证，即在所有情况下建立足够的条件，在某些情况下我们的方法比随机初始化更好。我们将我们的方法应用于各种非凸面问题，例如生成对抗性示例，为黑盒机器学习模型生成事后解释以及分配通信谱，并在其他初始化技术方面表现出一致的收益。

Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization rules are carefully designed by exploiting the nature of the problem class. As a simple alternative to hand-crafted initialization rules, we propose an approach for learning "good" initialization rules from previous solutions. We provide theoretical guarantees that establish conditions that are sufficient in all cases and also necessary in some under which our approach performs better than random initialization. We apply our methodology to various non-convex problems such as generating adversarial examples, generating post hoc explanations for black-box machine learning models, and allocating communication spectrum, and show consistent gains over other initialization techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题