样品依赖性自适应温度缩放以改善校准

论文标题

样品依赖性自适应温度缩放以改善校准

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

论文作者

Joy, Tom, Pinto, Francesco, Lim, Ser-Nam, Torr, Philip H. S., Dokania, Puneet K.

论文摘要

现在众所周知，神经网络对其预测的信心很高，导致校准不良。弥补这一点的最常见事后方法是执行温度缩放，这可以通过将逻辑缩放为固定值来调整任何输入对任何输入的预测。尽管这种方法通常会改善整个测试数据集中的平均校准，但这种改进通常会降低预测的单个信心，而与给定输入的分类是正确的还是不正确的。有了这种见解，我们将方法基于这样的观察结果，即不同的样品通过不同的量导致校准误差，有些人需要提高其信心，而另一些则需要减少它。因此，对于每个输入，我们建议预测不同的温度值，从而使我们能够在较细的粒度下调整置信度和准确性之间的不匹配。此外，我们观察到了OOD检测结果的改善，还可以提取数据点的硬度概念。我们的方法是在事后应用的，因此使用很少的计算时间和可忽略不计的记忆足迹，并应用于现成的预训练的分类器。我们使用CIFAR10/100和Tiny-Imagenet数据集对RESNET50和WIDERESNET28-10架构进行测试，这表明在整个测试集中产生每数据点温度也对预期的校准误差也有益。代码可在以下网址获得：https：//github.com/thwjoy/adats。

It is now well known that neural networks can be wrong with high confidence in their predictions, leading to poor calibration. The most common post-hoc approach to compensate for this is to perform temperature scaling, which adjusts the confidences of the predictions on any input by scaling the logits by a fixed value. Whilst this approach typically improves the average calibration across the whole test dataset, this improvement typically reduces the individual confidences of the predictions irrespective of whether the classification of a given input is correct or incorrect. With this insight, we base our method on the observation that different samples contribute to the calibration error by varying amounts, with some needing to increase their confidence and others needing to decrease it. Therefore, for each input, we propose to predict a different temperature value, allowing us to adjust the mismatch between confidence and accuracy at a finer granularity. Furthermore, we observe improved results on OOD detection and can also extract a notion of hardness for the data-points. Our method is applied post-hoc, consequently using very little computation time and with a negligible memory footprint and is applied to off-the-shelf pre-trained classifiers. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets, showing that producing per-data-point temperatures is beneficial also for the expected calibration error across the whole test set. Code is available at: https://github.com/thwjoy/adats.

下载PDF全文

下载文献需遵守相关版权规定

论文标题