在过程挖掘中进行随机一致性检查的熵相关度量

论文标题

在过程挖掘中进行随机一致性检查的熵相关度量

An Entropic Relevance Measure for Stochastic Conformance Checking in Process Mining

论文作者

Polyvyanyy, Artem, Moffat, Alistair, García-Bañuelos, Luciano

论文摘要

给定事件日志作为记录的实际过程痕迹的集合，过程挖掘旨在自动构建一个简单的过程模型，并提供了对痕迹的有用解释。然后，使用一致性检查技术来表征和量化日志痕迹和候选模型之间的共同点和差异。一致性检查的最新方法承认，所比较的要素是固有的随机性 - 例如，某些痕迹经常出现，而另一些则很少发生 - 并试图将这些知识纳入他们的分析中。在这里，我们根据模型提供的相对可能性的结构和信息，提出了一种随机构象检查的熵相关度量，以压缩每个日志迹线所需的平均数量。该度量从模型和模型所描述的模型和迹线捕获但在事件日志中不存在的事件日志中惩罚痕迹，从而同时解决了精度和召回质量标准。我们进一步表明，熵相关性在日志的大小上是线性计算的，并提供了评估结果，以证明在工业环境中使用新方法的可行性。

Given an event log as a collection of recorded real-world process traces, process mining aims to automatically construct a process model that is both simple and provides a useful explanation of the traces. Conformance checking techniques are then employed to characterize and quantify commonalities and discrepancies between the log's traces and the candidate models. Recent approaches to conformance checking acknowledge that the elements being compared are inherently stochastic - for example, some traces occur frequently and others infrequently - and seek to incorporate this knowledge in their analyses. Here we present an entropic relevance measure for stochastic conformance checking, computed as the average number of bits required to compress each of the log's traces, based on the structure and information about relative likelihoods provided by the model. The measure penalizes traces from the event log not captured by the model and traces described by the model but absent in the event log, thus addressing both precision and recall quality criteria at the same time. We further show that entropic relevance is computable in time linear in the size of the log, and provide evaluation outcomes that demonstrate the feasibility of using the new approach in industrial settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题