论文标题

有限混合物和伸缩抽样的普遍混合物

Generalized mixtures of finite mixtures and telescoping sampling

论文作者

Frühwirth-Schnatter, Sylvia, Malsiner-Walli, Gertraud, Grün, Bettina

论文摘要

在贝叶斯框架内,对有限混合物(MFM)的混合物进行了全面研究,即有限混合物,并进行了成分数量的先验。该模型类具有基于模型的聚类以及半参数密度估计的应用,并且需要适当的先前规格和推理方法来利用其全部潜力。我们通过考虑一般类别的MFMS来做出贡献,其中对称dirichlet的高参数$γ_k$在重量分布上取决于组件的数量。我们表明,该模型类可以被视为吉布斯型先验类别外的贝叶斯非参数混合物。我们强调了混合物的$ k $的组件数量与簇的数量$ k _+$,即给定数据的填充组件的数量。在MFM模型中,$ K _+$是一个随机变量,其先验取决于$ K $和超参数$γ_K$的先验变量。我们对$ k $的组件数量使用灵活的先验发行版,并在clentalized MFM的群集$ k _+$上得出相应的先验。对于后推断,我们提出了新型的望远镜采样器,该采样器允许贝叶斯推断具有任意组件分布的混合物,而无需求助于可逆的跳跃马尔可夫链蒙特卡洛(MCMC)方法。望远镜采样器明确采样了组件的数量,但否则仅需要有限混合模型的通常MCMC步骤。在几个数据集中证明了使用不同组件分布的应用程序的易用性。

Within a Bayesian framework, a comprehensive investigation of mixtures of finite mixtures (MFMs), i.e., finite mixtures with a prior on the number of components, is performed. This model class has applications in model-based clustering as well as for semi-parametric density estimation and requires suitable prior specifications and inference methods to exploit its full potential. We contribute by considering a generalized class of MFMs where the hyperparameter $γ_K$ of a symmetric Dirichlet prior on the weight distribution depends on the number of components. We show that this model class may be regarded as a Bayesian non-parametric mixture outside the class of Gibbs-type priors. We emphasize the distinction between the number of components $K$ of a mixture and the number of clusters $K_+$, i.e., the number of filled components given the data. In the MFM model, $K_+$ is a random variable and its prior depends on the prior on $K$ and on the hyperparameter $γ_K$. We employ a flexible prior distribution for the number of components $K$ and derive the corresponding prior on the number of clusters $K_+$ for generalized MFMs. For posterior inference, we propose the novel telescoping sampler which allows Bayesian inference for mixtures with arbitrary component distributions without resorting to reversible jump Markov chain Monte Carlo (MCMC) methods. The telescoping sampler explicitly samples the number of components, but otherwise requires only the usual MCMC steps of a finite mixture model. The ease of its application using different component distributions is demonstrated on several data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源