论文标题
没有预测模型的数据不确定性
Data Uncertainty without Prediction Models
论文作者
论文摘要
机器学习的数据采集过程通常是昂贵的。为了构建一个具有较少数据的高性能预测模型,通常将一定程度的预测困难作为添加新数据点的采集函数部署。难度程度被称为预测模型中的不确定性。我们提出了一种不确定性估计方法,即不明确使用预测模型,称为距离加权类杂质。我们使用该位置周围的距离和类杂质估算了不确定性,并将其与基于预测模型的几种方法进行了比较,以通过主动学习任务进行不确定性估计。我们验证了远距离加权类杂质的起作用,无论预测模型如何。
Data acquisition processes for machine learning are often costly. To construct a high-performance prediction model with fewer data, a degree of difficulty in prediction is often deployed as the acquisition function in adding a new data point. The degree of difficulty is referred to as uncertainty in prediction models. We propose an uncertainty estimation method named a Distance-weighted Class Impurity without explicit use of prediction models. We estimated uncertainty using distances and class impurities around the location, and compared it with several methods based on prediction models for uncertainty estimation by active learning tasks. We verified that the Distance-weighted Class Impurity works effectively regardless of prediction models.