论文标题
分散的随机梯度下降,用于有限和最小问题
Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems
论文作者
论文摘要
最小值优化问题近年来由于其在众多机器学习模型中的广泛应用而引起了极大的关注。为了解决最小值问题,已经提出了多种随机优化方法。但是,他们中的大多数忽略了培训数据分配在多个工人上的分布式设置。在本文中,我们开发了一种新型的分散的随机梯度下降方法,用于有限的和minimax问题。特别是,通过采用差异梯度,我们的方法可以实现$ O(\ frac {\ sqrt {n}κ^3} {((1-λ)^2ε^2})$样品复杂性和$ o(\ frac {κ^3} {(1-λ)^2ε^2} $ communical $ norim congement in Imim congement for in Imim congement for Imim congement for Imim congement for in Imim for in Imim for in Imim for in Imim for in Imim congonce for congong for congong for congonce。据我们所知,我们的工作是第一个实现此类最小问题的理论复杂性的工作。最后,我们将方法应用于AUC最大化,实验结果证实了我们方法的有效性。
Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}κ^3}{(1-λ)^2ε^2})$ sample complexity and $O(\frac{κ^3}{(1-λ)^2ε^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.