统一模型通过机器检查概念解释性和鲁棒性

论文标题

统一模型通过机器检查概念解释性和鲁棒性

Unifying Model Explainability and Robustness via Machine-Checkable Concepts

论文作者

Nanda, Vedant, Speicher, Till, Dickerson, John P., Gummadi, Krishna P., Zafar, Muhammad Bilal

论文摘要

随着深度神经网络（DNNS）在不断增加的应用中被采用，解释性已成为这些模型的至关重要的逃生性。在许多实际任务中，需要解释性的主要原因之一是反过来评估预测鲁棒性，在这种情况下，预测（即班级标签）不符合其各自的解释（例如，输入中的概念）是不可靠的。但是，大多数（如果不是全部）检查说明符合符合性的先前方法（例如，石灰，TCAV，显着图）需要大量的手动干预，这阻碍了其大规模的可部署性。在本文中，我们提出了一个健壮性评估框架，其核心是使用机器检查概念的想法。我们的框架定义了大量DNN解释可以基于的概念，并在测试时间执行解释符合性检查以评估预测鲁棒性。这两个步骤均以自动化方式执行，而无需任何人类干预，并且可以轻松地缩放到大量类别的数据集中。对现实世界数据集和人类调查的实验表明，我们的框架能够显着增强预测的鲁棒性：标记为坚固的框架的预测具有更高的准确性，并且对对抗性扰动更为强大。

As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Both steps are executed in an automated manner without requiring any human intervention and are easily scaled to datasets with a very large number of classes. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly: the predictions marked to be robust by our framework have significantly higher accuracy and are more robust to adversarial perturbations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题