论文标题

受约束最大最大优化的复杂性

The Complexity of Constrained Min-Max Optimization

论文作者

Daskalakis, Constantinos, Skoulakis, Stratis, Zampetakis, Manolis

论文摘要

尽管在机器学习中具有重要的应用,但对非Convex-Nonconcave目标的最低最大优化仍然难以捉摸。不仅没有已知的一阶方法甚至会收敛到近似局部的最低点点,而且对识别它们的计算复杂性也很少理解。在本文中,我们提供了问题的计算复杂性的特征,以及在受约束的Min-Max优化问题中使用非Convex-Nonconcave目标和线性约束的一阶方法的局限性。 作为热身,我们表明,即使目标是Lipschitz和平滑的可区分功能,决定是否存在Min-Max点,实际上,甚至决定是否存在大约最小的最大值点,也是NP-HARD。更重要的是,我们表明,确保存在足够大的近似局部最小值点,但是找到这样一个点是PPAD统计数字。计算梯度下降/上升的近似固定点也是如此。 我们证明的重要副产品是在Nemirovsky-Yudin模型中建立无条件硬度。我们表明,给定对某些函数$ f:p \对[-1,1] $及其梯度$ \ nabla f $ $ 1/\ varepsilon $,$ l $,$ g $或$ d $,其中$ l $和$ g $分别是$ f $和$ f $和$ d $的光滑度和lipschitzness。这与最小化的问题形成了鲜明的对比,在同一设置中可以使用$ o(l/\ varepsilon)$许多查询来在同一环境中找到近似的本地最小值。我们的结果是第一个在这两个基本优化问题之间显示指数分离的结果。

Despite its important applications in Machine Learning, min-max optimization of nonconvex-nonconcave objectives remains elusive. Not only are there no known first-order methods converging even to approximate local min-max points, but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity of the problem, as well as of the limitations of first-order methods in constrained min-max optimization problems with nonconvex-nonconcave objectives and linear constraints. As a warm-up, we show that, even when the objective is a Lipschitz and smooth differentiable function, deciding whether a min-max point exists, in fact even deciding whether an approximate min-max point exists, is NP-hard. More importantly, we show that an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. The same is true of computing an approximate fixed point of Gradient Descent/Ascent. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin model. We show that, given oracle access to some function $f : P \to [-1, 1]$ and its gradient $\nabla f$, where $P \subseteq [0, 1]^d$ is a known convex polytope, every algorithm that finds a $\varepsilon$-approximate local min-max point needs to make a number of queries that is exponential in at least one of $1/\varepsilon$, $L$, $G$, or $d$, where $L$ and $G$ are respectively the smoothness and Lipschitzness of $f$ and $d$ is the dimension. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using $O(L/\varepsilon)$ many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源