论文标题
通过梯度跟踪分散的稀疏线性回归:线性收敛和统计保证
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees
论文作者
论文摘要
我们在代理网络上研究稀疏的线性回归,该网络被建模为无向图,没有服务器节点。 $ S $ -SPARSE参数的估计是作为约束的套索问题制定的,其中每个代理都有$ n $总观测值的子集。我们分析了基于高维缩放的分布式梯度跟踪算法的收敛速率和统计保证,从而使环境尺寸$ d $随着(可能超过)样本量$ n $增长。我们的理论表明,根据损失功能的强质量和平滑度的标准概念,网络连接性和算法调整的适当条件,分布式算法以{\ it线性}的速度汇总到全球范围内的估计速率,该估计值是集中式{\ IT统计精度}模型,$ OO(s $ o(s s n n n n n of)。当$ s \ log d/n = o(1)$,统计一致性所需的条件时,$ \ varepsilon $ - 最佳解决方案将在$ \ nathcal {o}(κ\ log \ log(1/\ varepsilon)$ log(1/\ varepsilon)$ o($ o(1- \ varepsilon)和$ o(κ/(1-ρ)$ log(1-- eveps)$ log(1- $ log $)$)$(1- \ varepsilon)中)。损失函数的限制条件数量和$ρ$可以测量网络连接性。尽管分布数据,但计算成本与集中式投影梯度算法的成本相匹配。而随着网络连通性的改善,通信循环会减少。总体而言,我们的研究揭示了统计效率,网络连通性\&拓扑结构与高维度的收敛速率之间的有趣联系。
We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension $d$ to grow with (and possibly exceed) the sample size $N$. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$. When $s\log d/N=o(1)$, a condition necessary for statistical consistency, an $\varepsilon$-optimal solution is attained after $\mathcal{O}(κ\log (1/\varepsilon))$ gradient computations and $O (κ/(1-ρ) \log (1/\varepsilon))$ communication rounds, where $κ$ is the restricted condition number of the loss function and $ρ$ measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions.