内核学习方法总结和结合后相似性矩阵

论文标题

内核学习方法总结和结合后相似性矩阵

Kernel learning approaches for summarising and combining posterior similarity matrices

论文作者

Cabassi, Alessandra, Richardson, Sylvia, Kirk, Paul D. W.

论文摘要

当使用马尔可夫链蒙特卡洛（MCMC）算法对贝叶斯聚类模型（例如混合模型）执行推断时，输出通常是从后验分布中绘制的聚类（分区）样本。实际上，一个关键的挑战是如何总结此输出。在这里，我们基于后验相似性矩阵（PSM）的概念，以建议用于总结贝叶斯聚类模型的MCMC算法输出的新方法。我们工作的一个关键贡献是，观察到PSM是正半准的，因此可以用来定义捕获数据中存在的群集结构的概率动机的核矩阵。该观察结果使我们能够采用一系列内核方法来获得汇总聚类，并在其他情况下利用PSMS摘要的信息。例如，如果我们有多个PSM，则每个PSM对应于一组统计单元上的不同数据集，则可以使用标准方法将内核组合以执行集成群集。我们可以在预测内核模型中嵌入PSM，以执行结果引导的数据集成。我们通过一系列模拟研究以及两个实际数据应用来证明所提出的方法的性能。 R代码可从https://github.com/acabassi/combine-psms获得。

When using Markov chain Monte Carlo (MCMC) algorithms to perform inference for Bayesian clustering models, such as mixture models, the output is typically a sample of clusterings (partitions) drawn from the posterior distribution. In practice, a key challenge is how to summarise this output. Here we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and otherwise exploit the information summarised by PSMs. For example, if we have multiple PSMs, each corresponding to a different dataset on a common set of statistical units, we may use standard methods for combining kernels in order to perform integrative clustering. We may moreover embed PSMs within predictive kernel models in order to perform outcome-guided data integration. We demonstrate the performances of the proposed methods through a range of simulation studies as well as two real data applications. R code is available at https://github.com/acabassi/combine-psms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题