论文标题

树空间上对数孔密度的最大似然估计

Maximum Likelihood Estimation of Log-Concave Densities on Tree Space

论文作者

Takazawa, Yuki, Sei, Tomonari

论文摘要

系统发育树是生物学中的关键数据对象,系统发育重建方法已经高度发展。系统发育树的空间是一个非阳性弯曲的度量空间。最近,利用此属性开发了分析该空间上树木集的统计方法。同时,在欧几里得空间中,对数符号最大似然方法已成为一种用于概率密度估计的新的非参数方法。在本文中,我们得出了足够的条件,可以使树空间上的对数凸孔最大似然估计量的存在和独特性。我们还提出了一个和二维的估计算法。由于各种因素会影响推断的树木,因此很难指定样品树的分布。对数符合密度的类别是非参数,但是估计可以通过最大似然方法进行,而无需选择超参数。我们将估计性能与先前开发的内核密度估计器进行数值比较。在我们的示例中,真实密度是对数孔的,我们证明了估计器在样本量较大时具有较小的集成平方误差。我们还使用预期最大化(EM)算法进行聚类的数值实验,并使用FRéchet平均值将结果与K-Means ++聚类进行比较。

Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze the set of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of sample trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization (EM) algorithm and compare the results with k-means++ clustering using Fréchet mean.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源