论文标题
无分配的有限样本保证和拆分保形预测
Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction
论文作者
论文摘要
现代的黑盒预测模型通常伴随着弱性能保证,只能在数据集的大小上渐近或需要强有力的参数假设。为此,分裂保形预测代表了在最少的无分配假设下获得有限样本担保的有希望的途径。尽管预测设定的有效性最常涉及边际覆盖范围,但我们探讨了宽容区域的相关但不同的保证,以嵌套预测集的语言重新制定已知结果,并扩展了边际覆盖范围和容忍区域之间的二元性。此外,我们强调了1940年代开发的共形预测与经典公差预测因子以及无分配风险控制方面的最新发展之间的联系。从经典公差预测指标转移的结果之一是,基于订单统计数据的预测集的覆盖范围是校准集的条件,是随机变量随机变量,在随机变量上随机主导了beta分布。我们使用流行的称为共构化分数回归(CQR)的流行拆分保形预测程序在合成和真实数据集上的发现具有经验有效性。
Modern black-box predictive models are often accompanied by weak performance guarantees that only hold asymptotically in the size of the dataset or require strong parametric assumptions. In response to this, split conformal prediction represents a promising avenue to obtain finite-sample guarantees under minimal distribution-free assumptions. Although prediction set validity most often concerns marginal coverage, we explore the related but different guarantee of tolerance regions, reformulating known results in the language of nested prediction sets and extending on the duality between marginal coverage and tolerance regions. Furthermore, we highlight the connection between split conformal prediction and classical tolerance predictors developed in the 1940s, as well as recent developments in distribution-free risk control. One result that transfers from classical tolerance predictors is that the coverage of a prediction set based on order statistics, conditional on the calibration set, is a random variable stochastically dominating the Beta distribution. We demonstrate the empirical effectiveness of our findings on synthetic and real datasets using a popular split conformal prediction procedure called conformalized quantile regression (CQR).