关于大型数据集的空间统计数据的第二场比赛

论文标题

关于大型数据集的空间统计数据的第二场比赛

The Second Competition on Spatial Statistics for Large Datasets

论文作者

Abdulah, Sameh, Alamri, Faten, Nag, Pratik, Sun, Ying, Ltaief, Hatem, Keyes, David E., Genton, Marc G.

论文摘要

在过去的几十年中，随着数据收集技术的发展，许多研究领域的时空数据集的大小迅速增加。结果，空间统计中的经典统计方法正面临计算挑战。例如，在处理大型数据集的传统硬件体系结构上，地统计学中的kriging预测变量变得过于敏锐，因为它在处理大型密集矩阵操作时需要高的计算能力和内存足迹。多年来，已经提出了各种近似方法来解决此类计算问题，但是，社区缺乏评估其近似效率的整体过程。为了提供公平评估，在2021年，我们组织了大型数据集的空间统计竞赛，这是由我们的{\ em exageostat}软件生成的，并要求参与者报告估计和预测的结果。得益于其广泛认可的成功和许多参与者的要求，我们组织了2022年的第二场比赛，重点介绍了更复杂的时空和时空过程的预测，包括单变量的非组织空间过程，单变量的固定时空流程以及双分支平稳的空间过程。在本文中，我们详细描述了数据生成程序，并使有价值的数据集公开可用于更广泛的采用。然后，我们回顾了全球14个团队的提交方法，分析竞争成果并评估每个团队的表现。

In the last few decades, the size of spatial and spatio-temporal datasets in many research areas has rapidly increased with the development of data collection technologies. As a result, classical statistical methods in spatial statistics are facing computational challenges. For example, the kriging predictor in geostatistics becomes prohibitive on traditional hardware architectures for large datasets as it requires high computing power and memory footprint when dealing with large dense matrix operations. Over the years, various approximation methods have been proposed to address such computational issues, however, the community lacks a holistic process to assess their approximation efficiency. To provide a fair assessment, in 2021, we organized the first competition on spatial statistics for large datasets, generated by our {\em ExaGeoStat} software, and asked participants to report the results of estimation and prediction. Thanks to its widely acknowledged success and at the request of many participants, we organized the second competition in 2022 focusing on predictions for more complex spatial and spatio-temporal processes, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. In this paper, we describe in detail the data generation procedure and make the valuable datasets publicly available for a wider adoption. Then, we review the submitted methods from fourteen teams worldwide, analyze the competition outcomes, and assess the performance of each team.

下载PDF全文

下载文献需遵守相关版权规定

论文标题