论文标题
更高程度的平方符号放松与遗忘的异常值
Higher degree sum-of-squares relaxations robust against oblivious outliers
论文作者
论文摘要
我们考虑$ y = x^*+n $的估计模型,其中$ x^*$是我们希望恢复的$ m $二维信号,而$ n $是对称分布的噪声,除了小$α$分数外,这些噪声可能是无限的。我们介绍了一个算法系列,在轻度假设下,在所有估计问题中都恢复了信号$ x^*$,而存在的所有估计问题总和是在噪声$ n $是高斯时成功恢复信号$ x^*$的。从本质上讲,这足以设计一个方形算法,以解决高斯噪声的估计问题,以获取与对称噪声模型一起使用的算法。我们的框架远远超出了对称噪声模型的先前结果,甚至对对抗性扰动也是强大的。 作为具体的例子,我们研究了两个问题,这些问题没有有效的算法可用于重尾噪声:张量PCA和稀疏的PCA。对于前者,当信噪比至少为$ \ tilde {o}(n^{p/4}/α)$时,我们的算法在多项式时间内恢复了主要成分,而匹配(最多可对数因素)当前最已知的算法保证的算法。对于后者而言,我们的算法在准单行时间内运行,并匹配在高斯噪声的情况下,准确度性时间算法的最新保证。利用种植集团问题的减少,我们提供了证据表明,对于具有对称噪声的稀疏PCA可能需要进行准论时间。 在我们的证据中,我们在封面数量的数量上使用了一组伪预测的范围,我们通过在一组解决方案集合的高斯复杂性上认证上的上限来获得。这种界限伪造期望的覆盖数量的方法本身可能很有趣,并且可能会在未来的工作中找到其他应用程序。
We consider estimation models of the form $Y=X^*+N$, where $X^*$ is some $m$-dimensional signal we wish to recover, and $N$ is symmetrically distributed noise that may be unbounded in all but a small $α$ fraction of the entries. We introduce a family of algorithms that under mild assumptions recover the signal $X^*$ in all estimation problems for which there exists a sum-of-squares algorithm that succeeds in recovering the signal $X^*$ when the noise $N$ is Gaussian. This essentially shows that it is enough to design a sum-of-squares algorithm for an estimation problem with Gaussian noise in order to get the algorithm that works with the symmetric noise model. Our framework extends far beyond previous results on symmetric noise models and is even robust to adversarial perturbations. As concrete examples, we investigate two problems for which no efficient algorithms were known to work for heavy-tailed noise: tensor PCA and sparse PCA. For the former, our algorithm recovers the principal component in polynomial time when the signal-to-noise ratio is at least $\tilde{O}(n^{p/4}/α)$, that matches (up to logarithmic factors) current best known algorithmic guarantees for Gaussian noise. For the latter, our algorithm runs in quasipolynomial time and matches the state-of-the-art guarantees for quasipolynomial time algorithms in the case of Gaussian noise. Using a reduction from the planted clique problem, we provide evidence that the quasipolynomial time is likely to be necessary for sparse PCA with symmetric noise. In our proofs we use bounds on the covering numbers of sets of pseudo-expectations, which we obtain by certifying in sum-of-squares upper bounds on the Gaussian complexities of sets of solutions. This approach for bounding the covering numbers of sets of pseudo-expectations may be interesting in its own right and may find other application in future works.