论文标题
高斯平均值的多合一稳健估计器
All-In-One Robust Estimator of the Gaussian Mean
论文作者
论文摘要
本文的目的是表明,多元高斯分布的平均值的单个强大估计器可以享受五个理想的属性。首先,它在计算上是可以在污染速率的尺寸,样本量和对数的最多多项式的时间内计算的。其次,它是由翻译,均匀缩放和正交转换的等效性。第三,它的较高分解点等于$ 0.5 $,而接近少量的损坏点大约等于$ 0.28 $。第四,当数据由对抗性选择的异常值损坏的独立观察结果组成时,它是最佳的最佳速率,最多是对数因素。第五,当污染速度趋于零时,它在渐近上有效。估计器是通过迭代重新加权方法获得的。每个样品点都被分配了一个通过解决凸优化问题来迭代更新的权重。我们还为提出的估计量的预期误差建立了无维度的非轴突风险。这是文献中这种结果的第一个结果,仅涉及协方差矩阵的有效等级。最后,我们表明,获得的结果可以扩展到高斯以下分布,以及未知污染率或未知协方差矩阵的情况。
The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to $0.5$, and a nearly-minimax-rate-breakdown point approximately equal to $0.28$. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.