论文标题
使用集成似然的多级模型选择贝叶斯模型
Bayesian model selection for multilevel models using integrated likelihoods
论文作者
论文摘要
多级线性模型允许对具有不同分层级别的复杂数据的灵活统计建模。从大量可能的候选人组中确定最合适的模型是一个具有挑战性的问题。在贝叶斯环境中,标准方法是使用模型证据或贝叶斯因子对模型进行比较。这些数量的明确表达式可用于具有不现实先验的最简单的线性模型,但在大多数情况下,直接计算是不可能的。在实践中,马尔可夫链蒙特卡洛方法被广泛使用,例如顺序蒙特卡洛,但并不总是清楚这种技术的性能。我们提出了一种通过非方差参数的中间边缘化来估计对数模型证据的方法。这降低了任何蒙特卡洛采样算法的维度,从而产生更一致的估计。本文的目的是展示该框架如何拟合并在实践中起作用,尤其是在具有层次结构的数据上。我们在模拟的多级数据和流行的数据集上说明了这种方法,该数据集包含美国明尼苏达州家庭中的ra。
Multilevel linear models allow flexible statistical modelling of complex data with different levels of stratification. Identifying the most appropriate model from the large set of possible candidates is a challenging problem. In the Bayesian setting, the standard approach is a comparison of models using the model evidence or the Bayes factor. Explicit expressions for these quantities are available for the simplest linear models with unrealistic priors, but in most cases, direct computation is impossible. In practice, Markov Chain Monte Carlo approaches are widely used, such as sequential Monte Carlo, but it is not always clear how well such techniques perform. We present a method for estimation of the log model evidence, by an intermediate marginalisation over non-variance parameters. This reduces the dimensionality of any Monte Carlo sampling algorithm, which in turn yields more consistent estimates. The aim of this paper is to show how this framework fits together and works in practice, particularly on data with hierarchical structure. We illustrate this method on simulated multilevel data and on a popular dataset containing levels of radon in homes in the US state of Minnesota.