论文标题

用于研究数据分布的三角洲关闭结构

Delta-Closure Structure for Studying Data Distribution

论文作者

Buzmakov, Aleksey, Makhalova, Tatiana, Kuznetsov, Sergei O., Napoli, Amedeo

论文摘要

在本文中,我们重新审视模式挖掘并研究了二进制数据集的分布,这要归功于基于Passkeys的封闭结构,即在等效类中的最小发电机对噪声的强大。我们介绍了$δ$ cluctentes,这是封闭操作员的概括,其中$δ$测量封闭设置与封闭导致的部分顺序中的上端邻居的不同。 $δ$ - 等效性包括最小和最大元素,并允许我们表征数据基础的分布。此外,可以将$δ$ - 等价类别的集合划分为所谓的$δ$封闭结构。特别是,$δ$级的等效性具有高水平的相关性显示了许多属性之间的相关性,当$δ$很大时,这些属性得到了更多的观察值。在实验中,我们研究了几个现实世界数据集的$δ$封闭结构,并表明该结构对于大$δ$非常稳定,并且不大程度地取决于用于分析的数据采样。

In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $Δ$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $Δ$-classes of equivalence can be partitioned into the so-called $Δ$-closure structure. In particular, a $Δ$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $Δ$ is large. In the experiments, we study the $Δ$-closure structure of several real-world datasets and show that this structure is very stable for large $Δ$ and does not substantially depend on the data sampling used for the analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源