论文标题
高度可扩展的贝叶斯地统计建模通过分区域上的网状高斯工艺进行
Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains
论文作者
论文摘要
我们引入了一类可扩展的贝叶斯分层模型,以分析大规模的地统计数据集。基本思想通过对空间域进行分区并使用稀疏性诱导的有向无环图(DAG)对分区中的区域进行建模,从而结合了有关高维地统计学的想法。我们将模型扩展到DAG上,以一个定义明确的空间过程,我们称之为网状高斯过程(MGP)。一个主要的贡献是在镶嵌域上开发MGP,并伴随着Gibbs采样器,以有效恢复空间随机效应。特别是,立方MGP(Q-MGP)可以通过在Gibbs采样器中并行执行所有大规模操作来利用高性能计算资源,从而改善了混合和计算时间与顺序更新方案相比。与一些用于大型空间数据的现有模型不同,Q-MGP促进了昂贵的矩阵操作的大规模缓存,使其特别易于处理时空遥感数据。我们将Q-MGP与大型合成和现实世界数据与最新方法进行比较。我们还使用来自Serengeti Park地区的归一化差异指数(NDVI)数据来说明,以恢复数百万个位置的潜在多元时空随机效应。源代码可在https://github.com/mkln/meshgp上找到。
We introduce a class of scalable Bayesian hierarchical models for the analysis of massive geostatistical datasets. The underlying idea combines ideas on high-dimensional geostatistics by partitioning the spatial domain and modeling the regions in the partition using a sparsity-inducing directed acyclic graph (DAG). We extend the model over the DAG to a well-defined spatial process, which we call the Meshed Gaussian Process (MGP). A major contribution is the development of a MGPs on tessellated domains, accompanied by a Gibbs sampler for the efficient recovery of spatial random effects. In particular, the cubic MGP (Q-MGP) can harness high-performance computing resources by executing all large-scale operations in parallel within the Gibbs sampler, improving mixing and computing time compared to sequential updating schemes. Unlike some existing models for large spatial data, a Q-MGP facilitates massive caching of expensive matrix operations, making it particularly apt in dealing with spatiotemporal remote-sensing data. We compare Q-MGPs with large synthetic and real world data against state-of-the-art methods. We also illustrate using Normalized Difference Vegetation Index (NDVI) data from the Serengeti park region to recover latent multivariate spatiotemporal random effects at millions of locations. The source code is available at https://github.com/mkln/meshgp.