论文标题
具有选择性输入梯度正则化的策略蒸馏以有效解释性
Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability
论文作者
论文摘要
尽管深厚的加强学习(RL)已被证明在各种各样的任务中都取得了成功,但是当应用于现实世界中的问题时,面临的一个挑战是可解释性。显着图通常用于为深度神经网络提供可解释性。但是,在RL域中,现有的显着图方法在计算上是昂贵的,因此无法满足实际情况的实时要求,或者无法为RL策略提供可解释的显着性图。在这项工作中,我们提出了一种使用选择性输入梯度正则化(DIGR)的蒸馏方法,该方法使用策略蒸馏和输入梯度正则化来产生新的策略,从而在产生显着性图中既可以实现高解释性和计算效率。还发现我们的方法可以提高RL政策对多种对抗性攻击的鲁棒性。我们对三个任务(fetch对象),atari(突破)和卡拉自动驾驶进行实验,以证明我们方法的重要性和有效性。
Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach.