论文标题
关于评估分类器校准的拟合测试视图的有用性
On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers
论文作者
论文摘要
每个未校准的分类器都有相应的真实校准图,可校准其置信度。这张理想主义图与身份图的偏差显示出误解。通过许多事后校准方法可以减少此类校准误差,这些方法适合验证数据集上的某些校准图系列。相反,测试集上对预期校准误差(ECE)的校准评估并不明确涉及拟合。但是,正如我们所证明的那样,ECE仍然可以被视为将功能系列拟合在测试数据上。这激发了评估的合适测试视图:首先,在测试数据上近似校准图,其次,量化了其与身份的距离。利用这种观点可以使我们解锁错过的机会:(1)使用大量事后校准方法来评估校准; (2)通过交叉验证调整ECE中的垃圾箱数量。此外,我们介绍:(3)在伪真实数据上进行基准测试,其中可以非常精确地估计真实的校准图; (4)使用新校准图系列PL和PL3的新型校准和评估方法。
Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc calibration methods which fit some family of calibration maps on a validation dataset. In contrast, evaluation of calibration with the expected calibration error (ECE) on the test set does not explicitly involve fitting. However, as we demonstrate, ECE can still be viewed as if fitting a family of functions on the test data. This motivates the fit-on-the-test view on evaluation: first, approximate a calibration map on the test data, and second, quantify its distance from the identity. Exploiting this view allows us to unlock missed opportunities: (1) use the plethora of post-hoc calibration methods for evaluating calibration; (2) tune the number of bins in ECE with cross-validation. Furthermore, we introduce: (3) benchmarking on pseudo-real data where the true calibration map can be estimated very precisely; and (4) novel calibration and evaluation methods using new calibration map families PL and PL3.