论文标题
混合数值和分类空间上异常检测的解释方法
Explanation Method for Anomaly Detection on Mixed Numerical and Categorical Spaces
论文作者
论文摘要
异常检测领域中的大多数建议仅集中在检测阶段,特别是在最近的深度学习方法上。在提供高度准确的预测的同时,这些模型通常缺乏透明度,充当“黑匣子”。这种批评已经逐渐逐渐发展,即解释在可接受性和可靠性方面被认为非常相关。在本文中,我们通过检查ADMNC(混合数值和分类空间的异常检测)模型解决了这个问题,这是一种现有的非常准确的,尽管不透明的异常检测器能够使用数值和分类输入进行操作。这项工作提出了扩展EADMNC(在混合数值和分类空间上可解释的异常检测),这为用原始模型获得的预测提供了解释性。通过Apache Spark Framework,我们保留了原始方法的可扩展性。 EADMNC利用以前的ADMNC模型的配方提供了事前和事后解释性,同时保持原始体系结构的准确性。我们提出了一个事前模型,该模型通过将输入数据分割为均质组,在全球范围内解释了输出,仅描述了几个变量。我们设计了一个基于回归树的图形表示,主管可以检查以了解正常数据和异常数据之间的差异。我们的事后解释由基于文本的模板方法组成,该方法在本地提供了支持每个检测的文本参数。我们报告了有关广泛的现实数据的实验结果,特别是在网络入侵检测领域。使用网络入侵域中的专家知识来评估解释的有用性。
Most proposals in the anomaly detection field focus exclusively on the detection stage, specially in the recent deep learning approaches. While providing highly accurate predictions, these models often lack transparency, acting as "black boxes". This criticism has grown to the point that explanation is now considered very relevant in terms of acceptability and reliability. In this paper, we addressed this issue by inspecting the ADMNC (Anomaly Detection on Mixed Numerical and Categorical Spaces) model, an existing very accurate although opaque anomaly detector capable to operate with both numerical and categorical inputs. This work presents the extension EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), which adds explainability to the predictions obtained with the original model. We preserved the scalability of the original method thanks to the Apache Spark framework. EADMNC leverages the formulation of the previous ADMNC model to offer pre hoc and post hoc explainability, while maintaining the accuracy of the original architecture. We present a pre hoc model that globally explains the outputs by segmenting input data into homogeneous groups, described with only a few variables. We designed a graphical representation based on regression trees, which supervisors can inspect to understand the differences between normal and anomalous data. Our post hoc explanations consist of a text-based template method that locally provides textual arguments supporting each detection. We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection. The usefulness of the explanations is assessed by theory analysis using expert knowledge in the network intrusion domain.