论文标题
降低置信度阈值,改善深神经网络的鲁棒性和概括
Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction
论文作者
论文摘要
深层神经网络很容易被不可察觉的扰动攻击。目前,对抗训练(AT)是增强模型对对抗性例子的鲁棒性的最有效方法。但是,由于对抗性训练解决了最小的最大值问题,因此与自然训练相比,鲁棒性和概括是矛盾的,即模型的鲁棒性改善将减少模型的概括。为了解决这个问题,在本文中,引入了一个新概念,即置信度阈值(CT),并证明了置信阈值的降低,称为置信阈值降低(CTR),已被证明可以改善模型的概括和鲁棒性。具体而言,为了减少自然训练的CT(即使用CTR的自然训练),我们提出了一个掩盖引导的发散损失函数(MDL),该函数(MDL)由跨凝性损失项和正交项组成。经验和理论分析表明,MDL损失同时提高了模型的鲁棒性和概括性自然训练。但是,使用CTR的自然训练的模型鲁棒性改善与对抗性训练相当。因此,对于对抗性训练,我们提出了一个标准偏差损失函数(STD),该功能最大程度地减少了错误类别的概率的差异,以通过整合到对抗性训练的损失函数中来减少CT。经验和理论分析表明,基于STD的损耗函数可以通过保证自然准确性的不变或略微提高对抗训练模型的鲁棒性。
Deep neural networks are easily attacked by imperceptible perturbation. Presently, adversarial training (AT) is the most effective method to enhance the robustness of the model against adversarial examples. However, because adversarial training solved a min-max value problem, in comparison with natural training, the robustness and generalization are contradictory, i.e., the robustness improvement of the model will decrease the generalization of the model. To address this issue, in this paper, a new concept, namely confidence threshold (CT), is introduced and the reducing of the confidence threshold, known as confidence threshold reduction (CTR), is proven to improve both the generalization and robustness of the model. Specifically, to reduce the CT for natural training (i.e., for natural training with CTR), we propose a mask-guided divergence loss function (MDL) consisting of a cross-entropy loss term and an orthogonal term. The empirical and theoretical analysis demonstrates that the MDL loss improves the robustness and generalization of the model simultaneously for natural training. However, the model robustness improvement of natural training with CTR is not comparable to that of adversarial training. Therefore, for adversarial training, we propose a standard deviation loss function (STD), which minimizes the difference in the probabilities of the wrong categories, to reduce the CT by being integrated into the loss function of adversarial training. The empirical and theoretical analysis demonstrates that the STD based loss function can further improve the robustness of the adversarially trained model on basis of guaranteeing the changeless or slight improvement of the natural accuracy.