论文标题
VIT-CX:视觉变压器的因果解释
ViT-CX: Causal Explanation of Vision Transformers
论文作者
论文摘要
尽管视觉变压器(VIT)和可解释的AI(XAI)的流行,但到目前为止,仅专门为VIT设计了一些解释方法。他们主要使用[CLS]令牌的注意力重量,并经常产生不令人满意的显着性图。本文提出了一种解释称为VIT-CX的VIT的新方法。它基于补丁的嵌入,而不是对它们的关注,以及它们对模型输出的因果影响。在VIT-CX的设计中,还考虑了VIT的其他特征,例如因果过度确定。经验结果表明,VIT-CX产生更有意义的显着性图,并且比以前的方法更好地揭示了预测的所有重要证据。 VIT-CX产生的解释还显示出对模型的忠诚。代码和附录可在https://github.com/vaynexie/causalx-vit上找到。
Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are also considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model. The codes and appendix are available at https://github.com/vaynexie/CausalX-ViT.