鲁棒性可能与公平相矛盾：一项关于班级准确性的实证研究

论文标题

鲁棒性可能与公平相矛盾：一项关于班级准确性的实证研究

Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

论文作者

Benz, Philipp, Zhang, Chaoning, Karjauv, Adil, Kweon, In So

论文摘要

卷积神经网络（CNN）已取得了重大进步，但是，它们被广泛易于受到对抗攻击的影响。对抗性训练是最广泛使用的技术，用于改善对强烈的白色盒子攻击的对抗性鲁棒性。先前的工作是评估和改善模型的平均鲁棒性，而无需进行课堂评估。仅平均评估可能会带来虚假的鲁棒感。例如，攻击者可以专注于攻击脆弱的阶级，这可能是危险的，尤其是当脆弱的阶级是关键的阶级时，例如自主驾驶中的“人类”。我们提出了一项关于阶级训练模型的准确性和鲁棒性的实证研究。我们发现，即使训练数据集为每个班级具有相等数量的样本，也存在类间差异，以达到准确性和鲁棒性。例如，在CIFAR10中，“ CAT”比其他类别更容易受到伤害。此外，对于通常训练的模型，这种类间的差异也存在，而对抗训练往往会进一步增加差异。我们的工作旨在调查以下问题：（a）无论数据集，模型体系结构和优化超参数如何，阶层间差异的现象是什么？（b）如果是这样，什么可以解释阶层间差异？（c）长尾巴分类中提出的技术可以很容易地扩展到对抗性训练以解决类间差异吗？

Convolutional neural networks (CNNs) have made significant advancement, however, they are widely known to be vulnerable to adversarial attacks. Adversarial training is the most widely used technique for improving adversarial robustness to strong white-box attacks. Prior works have been evaluating and improving the model average robustness without class-wise evaluation. The average evaluation alone might provide a false sense of robustness. For example, the attacker can focus on attacking the vulnerable class, which can be dangerous, especially, when the vulnerable class is a critical one, such as "human" in autonomous driving. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class. For example, in CIFAR10, "cat" is much more vulnerable than other classes. Moreover, this inter-class discrepancy also exists for normally trained models, while adversarial training tends to further increase the discrepancy. Our work aims to investigate the following questions: (a) is the phenomenon of inter-class discrepancy universal regardless of datasets, model architectures and optimization hyper-parameters? (b) If so, what can be possible explanations for the inter-class discrepancy? (c) Can the techniques proposed in the long tail classification be readily extended to adversarial training for addressing the inter-class discrepancy?

下载PDF全文

下载文献需遵守相关版权规定

论文标题