论文标题

自适应显式内核Minkowski加权K-均值

Adaptive Explicit Kernel Minkowski Weighted K-means

论文作者

Aradnia, Amir, Haeri, Maryam Amir, Ebadzadeh, Mohammad Mehdi

论文摘要

K-均值算法是最常用的数据聚类方法之一。但是,常规K-均值只能在输入空间中应用,并且当簇可分离时,它适用。将K-均值扩展到内核空间的内核K-均值能够捕获非线性结构并识别任意形状的簇。但是,内核方法通常在数据的内核矩阵上运行,这些核对矩阵的尺寸较差,或者由于内核值的重复计算而遭受了高聚类成本。另一个问题是,算法仅通过评估$ k(x_i,x_j)$访问数据,这限制了可以通过群集任务在数据上完成的许多过程。本文提出了一种通过基于光谱分析的驱动相应近似有限维特征映射来结合线性和非线性方法优势的方法。在支持向量机(SVM)问题中仅讨论了近似有限维特征图。我们建议在内核K-均值时代使用此方法,以减轻存储巨大的内核矩阵,在内存中进一步计算集群中心,并在特征空间中明确访问数据。这些明确的特征地图使我们能够明确访问功能空间中的数据,并利用该空间中的K-均值扩展。我们证明了我们的显式内核Minkowski加权K-Mean(显式KMWK-MEAN)方法可以通过应用其他Minkowski指数和功能权重参数来获得更多的采用,并在新空间中找到最佳拟合值。此外,它可以通过建议在其他规范中而不是欧几里得规范之间进行研究,包括Minkowski规范和分数规范(作为Minkowski规范的扩展,P <1的扩展),可以减少浓度对最近的邻居搜索的影响。

The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluations of $K(x_i, x_j)$, which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using driven corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps were only discussed in the Support Vector Machines (SVM) problems before. We suggest using this method in kernel K-means era as alleviates storing huge kernel matrix in memory, further calculating cluster centers more efficiently and access the data explicitly in feature space. These explicit feature maps enable us to access the data in the feature space explicitly and take advantage of K-means extensions in that space. We demonstrate our Explicit Kernel Minkowski Weighted K-mean (Explicit KMWK-mean) method is able to be more adopted and find best-fitting values in new space by applying additional Minkowski exponent and feature weights parameter. Moreover, it can reduce the impact of concentration on nearest neighbour search by suggesting investigate among other norms instead of Euclidean norm, includes Minkowski norms and fractional norms (as an extension of the Minkowski norms with p<1).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源