论文标题

深层隔离森林进行异常检测

Deep Isolation Forest for Anomaly Detection

论文作者

Xu, Hongzuo, Pang, Guansong, Wang, Yijie, Wang, Yongjun

论文摘要

隔离森林(Iforest)由于其在不同基准和强大的可扩展性方面的一般有效性而成为近年来最流行的异常检测器的出现。然而,其线性轴平行的分离方法通常会导致(i)检测难以分离的高维/非线性分离数据空间的硬异常,以及(ii)臭名昭著的算法偏见,这些偏见分配了出乎意料的较低的较低的异常分配到人工体部门。这些问题导致了高的假负错误。引入了几个iForest扩展,但是它们基本上仍采用浅线性数据分区,从而限制了它们隔离真实异常的力量。因此,本文提出了深层隔离林。我们介绍了一种新的表示方案,该方案利用随意初始化的神经网络将原始数据映射到随机表示集合中,其中随后将随机轴平行切割应用于执行数据分区。该表示方案有助于原始数据空间中分区的高度自由度(相当于不同大小的子空间的非线性分区),鼓励在随机表示和基于随机分区的隔离之间产生独特的协同作用。广泛的实验表明,我们的模型对基于最先进的隔离方法以及对表格,图形和时间序列数据集的深度检测器的探测器有了显着改善。我们的模型还从iForest继承了所需的可扩展性。

Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation method often leads to (i) failure in detecting hard anomalies that are difficult to isolate in high-dimensional/non-linear-separable data space, and (ii) notorious algorithmic bias that assigns unexpectedly lower anomaly scores to artefact regions. These issues contribute to high false negative errors. Several iForest extensions are introduced, but they essentially still employ shallow, linear data partition, restricting their power in isolating true anomalies. Therefore, this paper proposes deep isolation forest. We introduce a new representation scheme that utilises casually initialised neural networks to map original data into random representation ensembles, where random axis-parallel cuts are subsequently applied to perform the data partition. This representation scheme facilitates high freedom of the partition in the original data space (equivalent to non-linear partition on subspaces of varying sizes), encouraging a unique synergy between random representations and random partition-based isolation. Extensive experiments show that our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on tabular, graph and time series datasets; our model also inherits desired scalability from iForest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源