论文标题
生存随机森林的置信带估计
Confidence Band Estimation for Survival Random Forests
论文作者
论文摘要
生存随机森林是一种流行的机器学习工具,用于对审查生存数据进行建模。但是,目前尚无统计上有效和计算可行的方法来估计其置信频段。本文提出了通过扩展无限级不完整U统计量的最新发展,提出了公正的置信度带估计。这个想法是在时间点网格上估算累积危害函数预测的方差 - 可协方差矩阵。然后,我们通过将累积危害函数估计作为高斯过程来产生置信带,可以通过模拟近似地分布。当树的亚采样大小不超过训练样本量的一半时,这种方法在计算上易于实现。数值研究表明,我们提出的方法准确地估算了置信带,并达到所需的覆盖率。我们将此方法应用于退伍军人的肺癌数据。
Survival random forest is a popular machine learning tool for modeling censored survival data. However, there is currently no statistically valid and computationally feasible approach for estimating its confidence band. This paper proposes an unbiased confidence band estimation by extending recent developments in infinite-order incomplete U-statistics. The idea is to estimate the variance-covariance matrix of the cumulative hazard function prediction on a grid of time points. We then generate the confidence band by viewing the cumulative hazard function estimation as a Gaussian process whose distribution can be approximated through simulation. This approach is computationally easy to implement when the subsampling size of a tree is no larger than half of the total training sample size. Numerical studies show that our proposed method accurately estimates the confidence band and achieves desired coverage rate. We apply this method to veterans' administration lung cancer data.