论文标题
用于研究数据分布的三角洲关闭结构
Delta-Closure Structure for Studying Data Distribution
论文作者
论文摘要
在本文中,我们重新审视模式挖掘并研究了二进制数据集的分布,这要归功于基于Passkeys的封闭结构,即在等效类中的最小发电机对噪声的强大。我们介绍了$δ$ cluctentes,这是封闭操作员的概括,其中$δ$测量封闭设置与封闭导致的部分顺序中的上端邻居的不同。 $δ$ - 等效性包括最小和最大元素,并允许我们表征数据基础的分布。此外,可以将$δ$ - 等价类别的集合划分为所谓的$δ$封闭结构。特别是,$δ$级的等效性具有高水平的相关性显示了许多属性之间的相关性,当$δ$很大时,这些属性得到了更多的观察值。在实验中,我们研究了几个现实世界数据集的$δ$封闭结构,并表明该结构对于大$δ$非常稳定,并且不大程度地取决于用于分析的数据采样。
In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $Δ$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $Δ$-classes of equivalence can be partitioned into the so-called $Δ$-closure structure. In particular, a $Δ$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $Δ$ is large. In the experiments, we study the $Δ$-closure structure of several real-world datasets and show that this structure is very stable for large $Δ$ and does not substantially depend on the data sampling used for the analysis.