论文标题
一种非相关和偏见的方法,用于同时推断高维混杂模型
A Decorrelating and Debiasing Approach to Simultaneous Inference for High-Dimensional Confounded Models
论文作者
论文摘要
本文由与潜在混杂因素的存在同时进行分析,研究了具有非偶然和渐近性假发现控制的高维混杂线性模型的大规模假设测试问题。这样的模型涵盖了各种实际设置,在这些环境中,响应和预测变量都可能被混淆。在存在高维预测因子和不可观察的混杂因素的情况下,与可证明的保证的同时推断变得高度挑战,而混杂的协变量之间未知的强大依赖性使挑战更加明显。本文首先引入了一个反相关的过程,该过程缩小了混杂效果并削弱了预测因子之间的相关性,然后基于某些有偏见的初始估计量,在反相关的设计下执行了偏见。随后,建立了依次的估计量的渐近正态性结果,然后构建了标准化的测试统计量。此外,提出了同时推理程序来识别重要的关联,并提供了有限样本和渐近假发现范围。非反应结果是一般且无模型的,并且具有独立感兴趣。我们还证明,在最小的信号强度条件下,可以成功检测到所有关联,概率趋于一个。进行仿真和实际数据研究以评估所提出的方法的性能,并将其与其他竞争方法进行比较。
Motivated by the simultaneous association analysis with the presence of latent confounders, this paper studies the large-scale hypothesis testing problem for the high-dimensional confounded linear models with both non-asymptotic and asymptotic false discovery control. Such model covers a wide range of practical settings where both the response and the predictors may be confounded. In the presence of the high-dimensional predictors and the unobservable confounders, the simultaneous inference with provable guarantees becomes highly challenging, and the unknown strong dependence among the confounded covariates makes the challenge even more pronounced. This paper first introduces a decorrelating procedure that shrinks the confounding effect and weakens the correlations among the predictors, then performs debiasing under the decorrelated design based on some biased initial estimator. Following that, an asymptotic normality result for the debiased estimator is established and standardized test statistics are then constructed. Furthermore, a simultaneous inference procedure is proposed to identify significant associations, and both the finite-sample and asymptotic false discovery bounds are provided. The non-asymptotic result is general and model-free, and is of independent interest. We also prove that, under minimal signal strength condition, all associations can be successfully detected with probability tending to one. Simulation and real data studies are carried out to evaluate the performance of the proposed approach and compare it with other competing methods.