论文标题
依赖性分解和可解释模型的拒绝选项
Dependency Decomposition and a Reject Option for Explainable Models
论文作者
论文摘要
在与安全有关的DO-ain(例如自动驾驶,医学诊断)中部署机器学习模型,要求可以解释的方法,可抵抗对抗性攻击并意识到模型不确定性。最近的深度学习模型在各种推理任务中表现出色,但是这些方法的黑盒性质会导致上述三个要求的弱点。最近的进步提供了可视化特征,描述输入(例如热图),提供文本解释或减少维度的方法。但是,分类任务的解释是否取决于彼此?对于内置,物体的形状是否取决于颜色?使用预测类来生成解释有什么影响?在可解释的深度学习模型的背景下,我们介绍了有关所需图像分类输出和解释变量(例如属性,文本,热图)的概率分布的依赖项的首次分析。因此,我们执行解释依赖性分解(EDD)。我们分析了不同依赖关系的含义,并提出了两种产生解释的方法。最后,我们使用解释来验证(接受或拒绝)预测
Deploying machine learning models in safety-related do-mains (e.g. autonomous driving, medical diagnosis) demands for approaches that are explainable, robust against adversarial attacks and aware of the model uncertainty. Recent deep learning models perform extremely well in various inference tasks, but the black-box nature of these approaches leads to a weakness regarding the three requirements mentioned above. Recent advances offer methods to visualize features, describe attribution of the input (e.g.heatmaps), provide textual explanations or reduce dimensionality. However,are explanations for classification tasks dependent or are they independent of each other? For in-stance, is the shape of an object dependent on the color? What is the effect of using the predicted class for generating explanations and vice versa? In the context of explainable deep learning models, we present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs and the explaining variables (e.g. attributes, texts, heatmaps). Therefore, we perform an Explanation Dependency Decomposition (EDD). We analyze the implications of the different dependencies and propose two ways of generating the explanation. Finally, we use the explanation to verify (accept or reject) the prediction