论文标题
使用有监督的图嵌入来自动选择聚类算法
Automatic selection of clustering algorithms using supervised graph embedding
论文作者
论文摘要
机器学习(ML)技术的广泛采用以及应用它们所需的广泛专业知识导致对自动化的ML解决方案的兴趣增加,从而减少了对人类干预的需求。将ML应用于以前看不见的问题的主要挑战之一是算法选择 - 给定数据集,任务和评估度量的高性能算法(S)的识别。这项研究解决了数据聚类的算法选择挑战,这是数据挖掘中的一项基本任务,旨在分组相似的对象。我们提出了Marco-GE,这是一种新型的元学习方法,用于自动推荐聚类算法。 Marco-GE首先将数据集转换为图形,然后利用图形卷积神经网络技术提取其潜在表示。使用获得的嵌入表示形式,Marco-GE训练一个排名元模型,能够准确推荐新数据集和聚类评估度量的表现最佳算法。对210个数据集,13种聚类算法和10种聚类措施的广泛评估证明了我们的方法的有效性及其在预测性和概括性能方面的优势,而不是最先进的聚类群元学习方法。
The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing algorithm(s) for a given dataset, task, and evaluation measure. This study addresses the algorithm selection challenge for data clustering, a fundamental task in data mining that is aimed at grouping similar objects. We present MARCO-GE, a novel meta-learning approach for the automated recommendation of clustering algorithms. MARCO-GE first transforms datasets into graphs and then utilizes a graph convolutional neural network technique to extract their latent representation. Using the embedding representations obtained, MARCO-GE trains a ranking meta-model capable of accurately recommending top-performing algorithms for a new dataset and clustering evaluation measure. Extensive evaluation on 210 datasets, 13 clustering algorithms, and 10 clustering measures demonstrates the effectiveness of our approach and its superiority in terms of predictive and generalization performance over state-of-the-art clustering meta-learning approaches.