在过度参数化的二聚体优化中的隐式偏差上

论文标题

在过度参数化的二聚体优化中的隐式偏差上

On Implicit Bias in Overparameterized Bilevel Optimization

论文作者

Vicol, Paul, Lorraine, Jonathan, Pedregosa, Fabian, Duvenaud, David, Grosse, Roger

论文摘要

机器学习中的许多问题都涉及双重优化（BLO），包括超参数优化，元学习和数据集蒸馏。二极管问题由两个嵌套的子问题组成，分别称为外部和内部问题。实际上，这些子问题中至少有一个通常被过度参数化。在这种情况下，有许多方法可以在Optima中进行选择，以实现同等的目标值。受到单级优化中优化算法引起的隐式偏差的最新研究的启发，我们研究了基于梯度的算法的隐式偏见以进行双光线优化。我们描绘了两种标准的BLO方法 - 冷启动和温暖 - 并表明融合的解决方案或长期行为在很大程度上取决于这些和其他算法选择，例如高疏水率近似。我们还表明，即使外部参数是低维的，也可以编码温暖启动BLO获得的内部解决方案，也可以编码有关外部物镜的惊人信息。我们认为，隐性偏见值得在单级神经净优化研究中所获得的双重优化研究中的角色。

Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题