论文标题
定向线性数据的Dirichlet工艺混合模型
A Dirichlet Process Mixture Model for Directional-Linear Data
论文作者
论文摘要
方向数据需要专门的概率模型,因为其域的非欧几里得和周期性。当定向变量与线性变量共同观察时,建模其依赖性会增加一层复杂性。本文介绍了一种基于Dirichlet过程的新型贝叶斯非参数方法,用于定向线性数据。我们首先将投影的正态分布扩展为对线性变量的关节分布和具有任意维度的定向变量进行建模,以作为较高维度增强的多元正态分布(MVN)的投影。我们称新分布为半项目的正态分布(SPN);它具有类似于MVN的特性。然后将SPN用作Dirichlet过程模型中的混合物分布,以获得更灵活的定向线性数据模型类别。我们提出一个正常的条件逆向分布作为先前分布的一部分,以解决从预计的正常情况下继承的可识别性问题,并保留与SPN分布的共轭。为后推断提供了Gibbs采样算法。与其他方向性线性模型相比,关于综合数据和伯克利图像数据库的实验显示了聚类中Dirichlet过程SPN混合模型(DPSPN)的出色性能。我们还使用SPN构建了层次的迪里奇过程模型,以使用DPSPN模型来开发一种可能性比率方法来进行血迹模式分析,以估算一组训练数据中给定模式的可能性。
Directional data require specialized probability models because of the non-Euclidean and periodic nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. This paper introduces a novel Bayesian nonparametric approach for directional-linear data based on the Dirichlet process. We first extend the projected normal distribution to model the joint distribution of linear variables and a directional variable with arbitrary dimension as a projection of a higher-dimensional augmented multivariate normal distribution (MVN). We call the new distribution the semi-projected normal distribution (SPN); it possesses properties similar to the MVN. The SPN is then used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. We propose a normal conditional inverse-Wishart distribution as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN distribution. A Gibbs sampling algorithm is provided for posterior inference. Experiments on synthetic data and the Berkeley image database show superior performance of the Dirichlet process SPN mixture model (DPSPN) in clustering compared to other directional-linear models. We also build a hierarchical Dirichlet process model with the SPN to develop a likelihood ratio approach to bloodstain pattern analysis using the DPSPN model for density estimation to estimate the likelihood of a given pattern from a set of training data.