论文标题
与封闭结构的发现数据拓扑结构。理论和实用方面
Discovery data topology with the closure structure. Theoretical and practical aspects
论文作者
论文摘要
在本文中,我们正在重新审视模式挖掘,尤其是项目集挖掘,这允许人们以无监督的方式来搜索有趣且有意义的关联规则和各自的项目集,以分析二进制数据集。尽管基于一组模式的数据集的汇总并不能在数据集上提供一般且令人满意的视图,但我们基于封闭项目集及其最小生成器引入简洁的表示形式(封闭结构),以捕获数据集的内在内容。闭合结构使人们可以在整体上了解数据集的拓扑结构以及数据的固有复杂性。我们建议用正式概念分析来形式化封闭结构,这很适合研究此数据拓扑。我们使用GDPM算法提出并展示理论结果,以及实际结果。 GDPM在其功能上是相当独特的,因为它返回了复杂性级别的数据集拓扑表征,从而突出了项目集的多样性和分布。最后,一系列实验显示了如何实际使用GDPM以及从输出中可以预期的。
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm. GDPM is rather unique in its functionality as it returns a characterization of the topology of a dataset in terms of complexity levels, highlighting the diversity and the distribution of the itemsets. Finally, a series of experiments shows how GDPM can be practically used and what can be expected from the output.