论文标题
最佳的鲁棒线性回归几乎线性时间
Optimal Robust Linear Regression in Nearly Linear Time
论文作者
论文摘要
我们研究了高维鲁棒线性回归的问题,在该问题中,学习者可以从生成模型$ y = \ langle x,w^* \ rangle +ε$(带有$ x \ in \ mathbb {r}^d $ and $ε$独立)中访问$ n $样本,其中$ x我们在两个设置下提出了有关此问题的估计器:(i)$ x $是L4-L2超收缩,$ \ Mathbb {e} [xx^\ top] $具有有限的条件编号,$ε$具有有限的差异,并且(ii)$ x $是sub-gaussian具有身份的第二瞬间,$ $ε$是$ε$是sub-gaussian。在这两种设置中,我们的估计器:(a)实现最佳样本复杂性和恢复可确保对数因子,并且(b)在几乎线性时间内运行($ \ tilde {o} {o}(nd /η^6)$)。在我们工作之前,仅在$ x $是具有身份协方差的高斯和$ε$的情况下,多项式时间算法仅在接近最佳样本复杂性的情况下才知道,而在任何环境中,对于可靠的线性回归而言,没有线性时间估计器知道。我们的估计器及其分析利用了稳健平均估计算法的构建方面的最新发展,以改善运行时间,并与高斯圆形技术一起进行精致的测量参数集中度以改善统计样本的复杂性。
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X,w^* \rangle + ε$ (with $X \in \mathbb{R}^d$ and $ε$ independent), in which an $η$ fraction of the samples have been adversarially corrupted. We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $\mathbb{E} [XX^\top]$ has bounded condition number and $ε$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $ε$ is sub-Gaussian. In both settings, our estimators: (a) Achieve optimal sample complexities and recovery guarantees up to log factors and (b) Run in near linear time ($\tilde{O}(nd / η^6)$). Prior to our work, polynomial time algorithms achieving near optimal sample complexities were only known in the setting where $X$ is Gaussian with identity covariance and $ε$ is Gaussian, and no linear time estimators were known for robust linear regression in any setting. Our estimators and their analysis leverage recent developments in the construction of faster algorithms for robust mean estimation to improve runtimes, and refined concentration of measure arguments alongside Gaussian rounding techniques to improve statistical sample complexities.