近似交叉验证：模型评估和选择的保证

论文标题

近似交叉验证：模型评估和选择的保证

Approximate Cross-validation: Guarantees for Model Assessment and Selection

论文作者

Wilson, Ashia, Kasy, Maximilian, Mackey, Lester

论文摘要

交叉验证（CV）是评估和选择预测模型的流行方法。但是，当折叠的数量较大时，CV会遭受重复在大量培训数据集上重复进行学习过程的需要。最新的经验风险最小化（ERM）的工作近似于昂贵的改装，而单一的牛顿步骤从完整的训练套装优化器中进行了温暖。尽管这可以大大减少运行时间，但仍有几个空旷的问题，包括这些近似是否会导致忠实的模型选择以及它们是否适合非平滑目标。我们以三个主要贡献解决了这些问题：（i）我们为近似简历提供了统一的非肌电，确定性模型评估保证；（ii）我们证明（大致）相同的条件也可以保证模型选择性能与简历相当；（iii）我们为非平滑预测问题的近似CV框架提供了近端的牛顿扩展，并为L1调查ERM等问题提供了改进的评估保证。

Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer. While this can greatly reduce runtime, several open questions remain including whether these approximations lead to faithful model selection and whether they are suitable for non-smooth objectives. We address these questions with three main contributions: (i) we provide uniform non-asymptotic, deterministic model assessment guarantees for approximate CV; (ii) we show that (roughly) the same conditions also guarantee model selection performance comparable to CV; (iii) we provide a proximal Newton extension of the approximate CV framework for non-smooth prediction problems and develop improved assessment guarantees for problems such as l1-regularized ERM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题