论文标题
IPD:基于增量原型的DBSCAN,用于大规模数据,带有集群代表
IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives
论文作者
论文摘要
DBSCAN是一种基于基于密度的聚类技术,可识别簇的任何任意形状。但是,在处理大数据时,它变得不可行。另一方面,基于质心的聚类对于检测数据集中的模式很重要,因为未加工的数据点可以标记为其最近的质心。但是,它无法检测到非球形簇。对于大数据,存储和计算每个样品的标签是不可行的。这些可以在需要信息时完成。当聚类作为识别群集代表的工具时,可以实现此目的,并通过分配最近代表的群集标签来提供查询。在本文中,我们提出了一种基于增量原型的DBSCAN(IPD)算法,该算法旨在识别用于大规模数据的任意形状群集。此外,它为每个集群选择一组代表。
DBSCAN is a fundamental density-based clustering technique that identifies any arbitrary shape of the clusters. However, it becomes infeasible while handling big data. On the other hand, centroid-based clustering is important for detecting patterns in a dataset since unprocessed data points can be labeled to their nearest centroid. However, it can not detect non-spherical clusters. For a large data, it is not feasible to store and compute labels of every samples. These can be done as and when the information is required. The purpose can be accomplished when clustering act as a tool to identify cluster representatives and query is served by assigning cluster labels of nearest representative. In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data. Additionally, it chooses a set of representatives for each cluster.