论文标题
概率分类中的预测性多样性
Predictive Multiplicity in Probabilistic Classification
论文作者
论文摘要
机器学习模型通常用于为现实世界的风险评估任务提供信息:预测消费者违约风险,预测一个人是否患有严重疾病或预测一个人出庭的风险。给定多个模型在预测任务中的性能几乎同样出色,这些模型的预测在多大程度上有所不同?如果预测对于类似模型相对一致,则选择优化惩罚损失足够的模型的标准方法。但是,如果类似模型的预测差异很大,该怎么办?在机器学习中,这被称为预测性多样性,即近距离竞争模型分配的冲突预测的流行。在本文中,我们提出了一个框架,用于测量概率分类中的预测性多样性(预测阳性结果的概率)。我们引入了衡量竞争模型集对风险估计的差异的措施,并开发基于优化的方法来有效,可靠地计算这些措施,以实现经验风险最小化问题。我们证明了现实世界任务中预测性多样性的发生率和流行率。此外,我们通过分析预测性多样性与数据集特征(离群值,可分离性和多数罚款结构)之间的关系来提供有关如何产生预测性多样性的洞察力。我们的结果强调需要更广泛地报告预测性多样性。
Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given multiple models that perform almost equally well for a prediction task, to what extent do predictions vary across these models? If predictions are relatively consistent for similar models, then the standard approach of choosing the model that optimizes a penalized loss suffices. But what if predictions vary significantly for similar models? In machine learning, this is referred to as predictive multiplicity i.e. the prevalence of conflicting predictions assigned by near-optimal competing models. In this paper, we present a framework for measuring predictive multiplicity in probabilistic classification (predicting the probability of a positive outcome). We introduce measures that capture the variation in risk estimates over the set of competing models, and develop optimization-based methods to compute these measures efficiently and reliably for convex empirical risk minimization problems. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Further, we provide insight into how predictive multiplicity arises by analyzing the relationship between predictive multiplicity and data set characteristics (outliers, separability, and majority-minority structure). Our results emphasize the need to report predictive multiplicity more widely.