论文标题
坚固或公平:在对抗训练中公平
To be Robust or to be Fair: Towards Fairness in Adversarial Training
论文作者
论文摘要
事实证明,对抗性训练算法可靠地改善机器学习模型对对抗性例子的鲁棒性。但是,我们发现对抗性训练算法倾向于在不同的数据组之间引入严重的准确性和鲁棒性差异。例如,CIFAR-10上的PGD对抗训练的RESNET18型号具有93%的清洁精度,并且在“ Automobile”班上的PGD l-Infty-8稳健精度为67%,但在类“ CAT”中只有65%和17%。这种现象发生在平衡的数据集中,仅在使用干净的样品时就不存在于自然训练的模型中。在这项工作中,我们从经验和理论上表明,这种现象可以在一般的对抗训练算法下发生,从而最大程度地减少DNN模型的强大错误。在这些发现的激励下,我们提出了一个公平的学习(FRL)框架,以减轻对对抗性防御的不公平问题。实验结果证明了FRL的有效性。
Adversarial training algorithms have been proved to be reliable to improve machine learning models' robustness against adversarial examples. However, we find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. For instance, a PGD adversarially trained ResNet18 model on CIFAR-10 has 93% clean accuracy and 67% PGD l-infty-8 robust accuracy on the class "automobile" but only 65% and 17% on the class "cat". This phenomenon happens in balanced datasets and does not exist in naturally trained models when only using clean samples. In this work, we empirically and theoretically show that this phenomenon can happen under general adversarial training algorithms which minimize DNN models' robust errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses. Experimental results validate the effectiveness of FRL.