论文标题
具有DIFFI:基于深度的隔离森林特征的可解释异常检测
Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest Feature Importance
论文作者
论文摘要
异常检测是一项无监督的学习任务,旨在检测有关历史数据的异常行为。特别是,多元异常检测在许多应用中具有重要作用,这要归功于总结具有单个指标(通常称为“异常得分”的复杂系统状态或观察到的现象)的能力,并且由于任务的无需人性标记的任务本质。隔离森林是由于其可证明的有效性和计算复杂性低,是异常检测领域中最常用的算法之一。影响隔离森林的一个主要问题是缺乏解释性,这是固有的随机性的效果,管理隔离树(隔离林的基础)所执行的分裂分裂的效果。在本文中,我们提出了有效但计算廉价的方法来定义隔离森林的全球和地方一级的特征评分。此外,我们定义了一种基于我们的可解释性方法对异常检测问题进行无监督特征选择的程序;此类程序还旨在应对无监督异常检测中功能重要性评估的具有挑战性的任务。我们评估了几个合成和现实世界数据集的性能,包括与最先进的可解释性技术的比较,并使该代码公开可用,以增强该领域的可重复性和促进研究。
Anomaly Detection is an unsupervised learning task aimed at detecting anomalous behaviours with respect to historical data. In particular, multivariate Anomaly Detection has an important role in many applications thanks to the capability of summarizing the status of a complex system or observed phenomenon with a single indicator (typically called `Anomaly Score') and thanks to the unsupervised nature of the task that does not require human tagging. The Isolation Forest is one of the most commonly adopted algorithms in the field of Anomaly Detection, due to its proven effectiveness and low computational complexity. A major problem affecting Isolation Forest is represented by the lack of interpretability, an effect of the inherent randomness governing the splits performed by the Isolation Trees, the building blocks of the Isolation Forest. In this paper we propose effective, yet computationally inexpensive, methods to define feature importance scores at both global and local level for the Isolation Forest. Moreover, we define a procedure to perform unsupervised feature selection for Anomaly Detection problems based on our interpretability method; such procedure also serves the purpose of tackling the challenging task of feature importance evaluation in unsupervised anomaly detection. We assess the performance on several synthetic and real-world datasets, including comparisons against state-of-the-art interpretability techniques, and make the code publicly available to enhance reproducibility and foster research in the field.