论文标题
与相互优先邻居的关系主题聚类
Relational Thematic Clustering with Mutually Preferred Neighbors
论文作者
论文摘要
长期以来,在网络数据中自动学习主题集群一直是机器学习社区的一项艰巨的任务。已经提出了许多方法来实现它,利用边缘,顶点特征或同时提到的。但是,很少有人考虑如何定量二分法W.R.T.网络拓扑和顶点特征可能会影响顶点群集偏好,从而阻止了以前的方法在网络数据中发现更明显的潜在组。为了填补这一空白,我们提出了一种新型的概率模型,该模型称为相互优选的邻居(RTCMPN),称为关系主题聚类。与预先确定边缘结构和顶点特征的学习意义的普遍方法不同,RTCMPN可以进一步学习潜在的偏好,以表明哪些相邻的顶点更有可能在同一群集中,以及描述相对意义的二分法倾向。边缘结构和顶点特征可能会影响成对顶点之间的关联。因此,可以通过RTCMPN学习植入边缘结构,顶点特征,相邻偏好和顶点vertex二分法的聚类结构。我们还得出了RTCMPN的有效期望最大化算法来推断最佳模型参数。将RTCMPN与各种网络数据上的几个强基线进行了比较。显着的结果证明了RTCMPN的有效性。
Automatically learning thematic clusters in network data has long been a challenging task in machine learning community. A number of approaches have been proposed to accomplish it, utilizing edges, vertex features, or both aforementioned. However, few of them consider how the quantification of dichotomous inclination w.r.t. network topology and vertex features may influence vertex-cluster preferences, which deters previous methods from uncovering more interpretable latent groups in network data. To fill this void, we propose a novel probabilistic model, dubbed Relational Thematic Clustering with Mutually Preferred Neighbors (RTCMPN). Different from prevalent approaches which predetermine the learning significance of edge structure and vertex features, RTCMPN can further learn the latent preferences indicating which neighboring vertices are more possible to be in the same cluster, and the dichotomous inclinations describing how relative significance w.r.t. edge structure and vertex features may impact the association between pairwise vertices. Therefore, cluster structure implanted with edge structure, vertex features, neighboring preferences, and vertex-vertex dichotomous inclinations can be learned by RTCMPN. We additionally derive an effective Expectation-Maximization algorithm for RTCMPN to infer the optimal model parameters. RTCMPN has been compared with several strong baselines on various network data. The remarkable results validate the effectiveness of RTCMPN.