在良好的属性及其与线性回归的关系上

论文标题

在良好的属性及其与线性回归的关系上

On the well-spread property and its relation to linear regression

论文作者

Chen, Hongjie, d'Orsi, Tommaso

论文摘要

我们考虑强大的线性回归模型$ \ boldsymbol {y} =xβ^* + \boldsymbolη$，其中对对手的对手却不见于设计$ x \ in \ mathbb {r}^r}^{n \ times d} $可能会选择$ \boldsymbolη$（可能是腐败的）$boldsymbolη$ bold bold bold bold bold bold，任意方式。最近的工作[DLN+21，DNS21]引入了有效算法，以持续恢复参数矢量。这些算法至关重要地依赖于设计矩阵非常广泛（如果其列跨度远非任何稀疏矢量，矩阵就可以很好地扩展）。在本文中，我们表明存在一个缺乏良好性的设计矩阵家族，因此在理论上不可能在上述稳健线性回归模型中持续恢复参数向量。我们进一步研究了随机矩阵的良好表现的平均案例时间复杂性。我们表明，如果观测值的数量在环境维度上是二次的，则可以有效地证明给定的$ n $ -b $ d $ d $ d $高斯矩阵是否会很好地扩展。当观察值为$ O（d^2）$时，我们通过显示出相同认证问题的计算硬度的严格证据（以低度多项式的形式）来补充这一结果。

We consider the robust linear regression model $\boldsymbol{y} = Xβ^* + \boldsymbolη$, where an adversary oblivious to the design $X \in \mathbb{R}^{n \times d}$ may choose $\boldsymbolη$ to corrupt all but a (possibly vanishing) fraction of the observations $\boldsymbol{y}$ in an arbitrary way. Recent work [dLN+21, dNS21] has introduced efficient algorithms for consistent recovery of the parameter vector. These algorithms crucially rely on the design matrix being well-spread (a matrix is well-spread if its column span is far from any sparse vector). In this paper, we show that there exists a family of design matrices lacking well-spreadness such that consistent recovery of the parameter vector in the above robust linear regression model is information-theoretically impossible. We further investigate the average-case time complexity of certifying well-spreadness of random matrices. We show that it is possible to efficiently certify whether a given $n$-by-$d$ Gaussian matrix is well-spread if the number of observations is quadratic in the ambient dimension. We complement this result by showing rigorous evidence -- in the form of a lower bound against low-degree polynomials -- of the computational hardness of this same certification problem when the number of observations is $o(d^2)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题