通过与对抗性鲁棒性的关系改善校准

论文标题

通过与对抗性鲁棒性的关系改善校准

Improving Calibration through the Relationship with Adversarial Robustness

论文作者

Qin, Yao, Wang, Xuezhi, Beutel, Alex, Chi, Ed H.

论文摘要

神经网络缺乏对抗性的鲁棒性，即，它们容易受到对抗性示例的影响，这些例子通过对输入的小扰动会导致错误的预测。此外，当模型给出错误校准的预测时，即预测的概率不是我们应该信任我们的模型的数量时，会破坏信任。在本文中，我们研究了对抗性鲁棒性和校准之间的联系，并发现该模型对小扰动敏感的输入（易于攻击）更可能具有校准较差的预测。基于此见解，我们检查是否可以通过解决这些对手无关的输入来改进校准。为此，我们提出了基于对抗性鲁棒性的自适应标签平滑（AR-ADALS），将对抗性鲁棒性和校准的相关性整合到训练中，通过自适应软化标签，以基于如何轻松攻击对手的示例。我们发现，考虑到分布数据的对抗性鲁棒性，即使在分布偏移下也可以更好地校准模型。此外，AR-Adals也可以应用于集合模型，以进一步改善模型校准。

Neural networks lack adversarial robustness, i.e., they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated predictions, i.e., the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. Based on this insight, we examine if calibration can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to further improve model calibration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题