论文标题
在一般缺失依赖性下估计高维协方差和精确矩阵
Estimating High-dimensional Covariance and Precision Matrices under General Missing Dependence
论文作者
论文摘要
完全观察到的数据的样品协方差矩阵$ \ boldsymbol {s} $是多种多元统计程序中的关键统计量,例如结构化协方差/精度矩阵估计,主成分分析和平均向量的平等测试。但是,当观察到数据时,可用数据的样本协方差矩阵会偏差,并且没有提供有效的多元过程。为了纠正偏见,在先前的研究中使用了一种称为逆概率加权(IPW)的简单调整方法,得出IPW估计器。估算器在缺失的数据上下文中扮演$ \ boldsymbol {s} $的角色,因此可以将其插入现成的多元过程中。但是,仅在非常简单的缺失结构下建立了IPW估计量的理论特性(例如浓度)。每个样品的每个变量都独立于丢失,同等概率。 当在普遍缺失的依赖性下部分观察到观测值时,我们研究了IPW估计器的偏差。我们证明了基于元素最大规范的IPW估算器的最佳收敛率$ o_p(\ sqrt {\ sqrt {\ log p / n})$。即使隐式假设(已知的平均值和/或缺失概率)也被放松,我们也会得出相似的偏差结果。最佳速率对于估计精度矩阵尤为重要,因为声称IPW估计器的速率控制所得的精度矩阵估计器的“元理论”。在仿真研究中,我们讨论了IPW估计器的非阳性半定义性,并将估计器与插补方法进行比较,这些方法实际上很重要。
A sample covariance matrix $\boldsymbol{S}$ of completely observed data is the key statistic in a large variety of multivariate statistical procedures, such as structured covariance/precision matrix estimation, principal component analysis, and testing of equality of mean vectors. However, when the data are partially observed, the sample covariance matrix from the available data is biased and does not provide valid multivariate procedures. To correct the bias, a simple adjustment method called inverse probability weighting (IPW) has been used in previous research, yielding the IPW estimator. The estimator plays the role of $\boldsymbol{S}$ in the missing data context so that it can be plugged into off-the-shelf multivariate procedures. However, theoretical properties (e.g. concentration) of the IPW estimator have been only established under very simple missing structures; every variable of each sample is independently subject to missing with equal probability. We investigate the deviation of the IPW estimator when observations are partially observed under general missing dependency. We prove the optimal convergence rate $O_p(\sqrt{\log p / n})$ of the IPW estimator based on the element-wise maximum norm. We also derive similar deviation results even when implicit assumptions (known mean and/or missing probability) are relaxed. The optimal rate is especially crucial in estimating a precision matrix, because of the "meta-theorem" that claims the rate of the IPW estimator governs that of the resulting precision matrix estimator. In the simulation study, we discuss non-positive semi-definiteness of the IPW estimator and compare the estimator with imputation methods, which are practically important.