论文标题
loglg:通过日志事件图构造弱监督的日志异常检测
LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction
论文作者
论文摘要
完全监督的对数异常检测方法承受着注释大量未标记的日志数据的沉重负担。最近,已经提出了许多半监督方法,以借助解析模板来降低注释成本。但是,这些方法独立考虑每个关键字,这无视关键字与日志序列之间的上下文关系之间的相关性。在本文中,我们提出了一个新型的弱监督的对数异常检测框架,名为Loglg,以探索序列中关键字之间的语义连接。具体而言,我们设计了一个端到端迭代过程,首先提取未标记的日志的关键字以构造日志事件图。然后,我们构建一个子图注释,以生成未标记的日志序列的伪标签。为了改善注释质量,我们采取了一项自我监督的任务来预先培训子图注释。之后,使用生成的伪标签训练检测模型。以分类结果为条件,我们从日志序列重新提取关键字,并为下一个迭代更新日志事件图。五个基准的实验验证了LogLG在未标记的日志数据上检测异常的有效性,并证明与现有方法相比,Loglg作为最先进的弱监督方法,可以取得重大的性能改善。
Fully supervised log anomaly detection methods suffer the heavy burden of annotating massive unlabeled log data. Recently, many semi-supervised methods have been proposed to reduce annotation costs with the help of parsed templates. However, these methods consider each keyword independently, which disregards the correlation between keywords and the contextual relationships among log sequences. In this paper, we propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences. Specifically, we design an end-to-end iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph. Then, we build a subgraph annotator to generate pseudo labels for unlabeled log sequences. To ameliorate the annotation quality, we adopt a self-supervised task to pre-train a subgraph annotator. After that, a detection model is trained with the generated pseudo labels. Conditioned on the classification results, we re-extract the keywords from the log sequences and update the log-event graph for the next iteration. Experiments on five benchmarks validate the effectiveness of LogLG for detecting anomalies on unlabeled log data and demonstrate that LogLG, as the state-of-the-art weakly supervised method, achieves significant performance improvements compared to existing methods.