论文标题
部分可观测时空混沌系统的无模型预测
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
论文作者
论文摘要
聚类分析中的一个关键问题是选择适当的聚类方法以及确定最佳簇数。根据不同的标准,不同的集群在相同的数据集上是最佳的,并且此类标准的选择取决于聚类的上下文和目的。因此,研究人员需要考虑其目标集群的数据分析特性,应该具有群体内同质性,群体之间的分离和稳定性。在这里,提出了一组内部聚类有效性指数,以测量聚类质量的不同方面,包括文献中的一些索引。用户可以选择手头应用程序中相关的索引。为了测量聚类的总体质量(用于比较不同方法和/或不同数量簇的聚类),校准了索引值以进行聚集。校准是相对于相同数据上的一组随机聚类的。提出了两个特定的汇总索引,并将其与模拟和真实数据的现有索引进行比较。
A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.