论文标题

柔性聚类通过隐藏的层次迪里奇先验

Flexible clustering via hidden hierarchical Dirichlet priors

论文作者

Lijoi, Antonio, Prünster, Igor, Rebaudo, Giovanni

论文摘要

贝叶斯的推理方法自然地允许在异质种群中借贷信息,不同的样本可能共享相同的分布。一个流行的贝叶斯非参数模型用于聚类概率分布是嵌套的dirichlet过程,但是当在样本中观察到纽带时,在单个群集中分组分布的缺点。为了实现样品和观察的柔性和有效聚类方法,我们研究了一个非参数之前,该方法作为两个不同离散的随机结构的组成,并得出了诱导随机分区分布的封闭形式表达,以调节模型的群集行为的基本工具。一方面,这可以更深入地了解模型的理论特性,另一方面,它产生了一种MCMC算法,用于评估贝叶斯的利益推论。此外,我们在与两个以上的人群一起工作时,我们可以选择该算法的局限性,因此,设计了一种更有效的抽样方案,该方案作为副产品,允许在不同人群之间测试同质性。最后,我们与嵌套的Dirichlet过程进行了比较,并提供了合成和真实数据的说明性示例。

The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源