使用分区数据的贝叶斯推断的重要性抽样方法

论文标题

使用分区数据的贝叶斯推断的重要性抽样方法

Importance Sampling Methods for Bayesian Inference with Partitioned Data

论文作者

Box, Marc

论文摘要

本文为基于样本的贝叶斯推断提供了新的方法，当数据分配时，零件之间的通信很昂贵，因为在“大数据”的背景下或选择以利用计算并行性的情况下，就必须出现。我们称之为拉普拉斯（Laplace）的方法丰富了多重重要性估计器，使用新的多重重要性采样技术来近似后期预期，使用与局部后验分布独立绘制的样品（这些样品在数据的孤立部分中进行条件）。我们构建了拉普拉斯近似值，可以从中可以相对迅速地从中绘制其他样品，并在高维估计中改善方法。这些方法是“令人尴尬的并行”，对采样算法（包括MCMC）不限制使用或选择先验分布，并且不依赖于后验的任何假设，例如正态性。证明了这些方法的性能并与模拟数据实验中的某些替代方案进行了比较。

This article presents new methodology for sample-based Bayesian inference when data are partitioned and communication between the parts is expensive, as arises by necessity in the context of "big data" or by choice in order to take advantage of computational parallelism. The method, which we call the Laplace enriched multiple importance estimator, uses new multiple importance sampling techniques to approximate posterior expectations using samples drawn independently from the local posterior distributions (those conditioned on isolated parts of the data). We construct Laplace approximations from which additional samples can be drawn relatively quickly and improve the methods in high-dimensional estimation. The methods are "embarrassingly parallel", make no restriction on the sampling algorithm (including MCMC) to use or choice of prior distribution, and do not rely on any assumptions about the posterior such as normality. The performance of the methods is demonstrated and compared against some alternatives in experiments with simulated data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题