对抗性的CNN有多解释？

论文标题

对抗性的CNN有多解释？

How explainable are adversarially-robust CNNs?

论文作者

Nourelahi, Mehdi, Kotthoff, Lars, Chen, Peijie, Nguyen, Anh

论文摘要

现有的卷积神经网络（CNN）的三个重要标准是（1）测试集精度；（2）分发精度；（3）解释性。尽管这些标准已经独立研究，但它们的关系尚不清楚。例如，具有更强分布性能的CNN是否也具有更强的解释性？此外，大多数先前的特征表现力研究仅评估2-3种经过培训的CNN的方法，这使得这些方法如何推广到其他体系结构和训练算法的CNN。在这里，我们使用9种功能描述方法和12种由3种训练算法和5个CNN架构的Imagenet训练的CNN对三个标准的关系进行了第一个大规模评估。我们为ML从业者找到了一些重要的见解和建议。首先，在基于梯度的归因方法（但不是基于CAM或基于扰动的方法）上，对抗性强大的CNN具有更高的解释性得分。其次，尽管在解释性上，Advprop模型比单独的香草和鲁棒模型都高度准确。第三，在测试的9种功能归因方法中，GradCAM和Rise始终是最佳方法。第四，由于它们与CNN的置信度分布有很强的相关性，因此插入和删除分别偏向于香草和鲁棒模型。第五，我们没有发现单个CNN在所有三个标准上都是最好的，有趣的是，这表明CNN更难解释，因为它们变得更加准确。

Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs have a higher explainability score on gradient-based attribution methods (but not CAM-based or perturbation-based methods). Second, AdvProp models, despite being highly accurate more than both vanilla and robust models alone, are not superior in explainability. Third, among 9 feature attribution methods tested, GradCAM and RISE are consistently the best methods. Fourth, Insertion and Deletion are biased towards vanilla and robust models respectively, due to their strong correlation with the confidence score distributions of a CNN. Fifth, we did not find a single CNN to be the best in all three criteria, which interestingly suggests that CNNs are harder to interpret as they become more accurate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题