论文标题
在视频深度学习模型中解释运动识别的运动相关性
Explaining Motion Relevance for Activity Recognition in Video Deep Learning Models
论文作者
论文摘要
最初为图像识别模型开发的一小部分解释性技术最近已应用于活动识别任务中3D卷积神经网络模型的解释性。就像模型本身一样,这些技术几乎不需要修改才能与3D输入兼容。但是,这些解释技术共同考虑空间和时间信息。因此,使用这样的解释技术,用户无法明确区分运动在3D模型决策中的作用。实际上,已经表明,这些模型并未适当地将运动信息纳入他们的决策中。我们提出了一种选择性相关方法,用于调整2D解释技术,以提供特定于运动的解释,使它们与人类对运动的理解更好地与与静态空间特征分开的理解。我们证明了我们方法与几种广泛使用的2D解释方法的实用性,并表明它提高了运动的解释选择性。我们的结果表明,选择性相关方法不仅可以提供有关运动在模型决策中所起的作用的见解 - 实际上,揭示和量化了模型的空间偏见 - 而且该方法还简化了对人类消费的结果解释。
A small subset of explainability techniques developed initially for image recognition models has recently been applied for interpretability of 3D Convolutional Neural Network models in activity recognition tasks. Much like the models themselves, the techniques require little or no modification to be compatible with 3D inputs. However, these explanation techniques regard spatial and temporal information jointly. Therefore, using such explanation techniques, a user cannot explicitly distinguish the role of motion in a 3D model's decision. In fact, it has been shown that these models do not appropriately factor motion information into their decision. We propose a selective relevance method for adapting the 2D explanation techniques to provide motion-specific explanations, better aligning them with the human understanding of motion as conceptually separate from static spatial features. We demonstrate the utility of our method in conjunction with several widely-used 2D explanation methods, and show that it improves explanation selectivity for motion. Our results show that the selective relevance method can not only provide insight on the role played by motion in the model's decision -- in effect, revealing and quantifying the model's spatial bias -- but the method also simplifies the resulting explanations for human consumption.