论文标题
大数据贝叶斯回归的空间多元树
Spatial Multivariate Trees for Big Data Bayesian Regression
论文作者
论文摘要
高分辨率地理空间数据具有挑战性,因为已知基于高斯过程的标准地统计模型不会扩展到大数据尺寸。尽管已经对可以更有效地计算的方法取得了进步,但注意力较少,专门用于大数据方法,这些方法允许描述不同传感器在高分辨率下记录的几种结果之间的复杂关系。我们的贝叶斯多元回归模型基于空间多元树(Spamtrees),通过有条件的独立性假设在treed定向的无环图之后对潜在随机效应实现可伸缩性。信息理论论证和计算效率的考虑指导树的构建以及不平衡多元设置中相关的有效采样算法。除了模拟数据示例外,我们还使用大型气候数据集说明了垃圾邮件,该数据集将卫星数据与陆基站数据结合在一起。源代码可从https://github.com/mkln/spamtree获得
High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to big data methods that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph. Information-theoretic arguments and considerations on computational efficiency guide the construction of the tree and the related efficient sampling algorithms in imbalanced multivariate settings. In addition to simulated data examples, we illustrate SpamTrees using a large climate data set which combines satellite data with land-based station data. Source code is available at https://github.com/mkln/spamtree