论文标题
寻求超出数据的真相。无监督的机器学习方法
Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach
论文作者
论文摘要
聚类是一种无监督的机器学习方法,其中未标记的元素/对象被分组在一起,旨在构建公认的群集,其元素根据其相似性进行了分类。该过程的目的是向研究人员提供有用的帮助,以帮助她/他确定数据中的模式。在处理大型数据库的情况下,如果没有聚类算法的贡献,这种模式可能无法轻易检测到。本文提供了最广泛使用的聚类方法的深入描述,并附有有关适当的参数选择和初始化的有用演示。同时,本文不仅代表了一篇评论,该评论强调了所检查的聚类技术的主要要素,而且强调了基于3个数据集的这些算法的聚类效率的比较,从而在离散和持续的观察期间通过准确性和复杂性来揭示其现有的弱点和能力。产生的结果有助于我们根据数据集的大小提取有关检查聚类技术的适当性的宝贵结论。
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.