论文标题
具有线性收敛的沟通效率算法,用于联合最小值学习
A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning
论文作者
论文摘要
在本文中,我们研究了一个大规模的多代理最小优化问题,该问题在统计学习和游戏理论中进行了许多有趣的应用,包括生成性对抗性网络(GAN)。总体目标是代理商的私人本地目标功能的总和。我们首先分析了一个重要的特殊情况,经验最小问题,在该问题中,总体目标近似于统计样本的真实人群最小风险。我们通过Rademacher的复杂性分析为学习提供了概括范围。然后,我们专注于联合设置,代理可以执行本地计算并与中央服务器进行通信。除局部随机梯度下降(SGDA)外,大多数现有的联合最小值算法都需要通过迭代进行通信或缺乏性能保证,这是一种多局部上升的下降上升算法,保证在减少的步骤下融合收敛。通过在没有梯度噪声的理想条件下分析局部SGDA,我们表明它通常无法保证具有恒定步骤的精确收敛,因此会受到收敛速度缓慢的损失。为了解决此问题,我们提出了FedGDA-GT,这是基于梯度跟踪(GT)的改进的联合(FED)梯度下降(GDA)方法。 When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $ε$-approximation solution with $\mathcal{O}(\log (1/ε))$ rounds of communication, which matches the time complexity of centralized GDA method.最后,我们从数字上表明,FedGDA-GT的表现优于本地SGDA。
In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $ε$-approximation solution with $\mathcal{O}(\log (1/ε))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.