论文标题
空间添加混合建模的可扩展模型选择:犯罪分析的应用
Scalable model selection for spatial additive mixed modeling: application to crime analysis
论文作者
论文摘要
空间开放数据集的快速增长导致对回归方法的巨大需求,可以适应大数据中的空间和非空间效应。回归模型选择对于稳定估计灵活回归模型尤为重要。但是,对于大型样本而言,常规方法可能会很慢。因此,我们为空间回归模型开发了一种快速且实用的模型选择方法,重点是选择系数类型,包括恒定,空间变化和非空间变化系数。预处理方法,通过降低尺寸替代小型内部产品的数据矩阵极大地加速了模型选择的计算速度。数值实验表明,我们的方法可以准确和计算有效地选择模型,从而突出了模型选择在空间回归上下文中的重要性。然后,将目前的方法应用于开放数据,以调查影响日本犯罪的当地因素。结果表明,我们的方法不仅有助于选择影响犯罪风险的因素,还可用于预测犯罪事件。这种可扩展的模型选择将是适当指定大数据时代的灵活和大规模空间回归模型的关键。开发的模型选择方法是在R软件包SPMORAN中实现的。
A rapid growth in spatial open datasets has led to a huge demand for regression approaches accommodating spatial and non-spatial effects in big data. Regression model selection is particularly important to stably estimate flexible regression models. However, conventional methods can be slow for large samples. Hence, we develop a fast and practical model-selection approach for spatial regression models, focusing on the selection of coefficient types that include constant, spatially varying, and non-spatially varying coefficients. A pre-processing approach, which replaces data matrices with small inner products through dimension reduction dramatically accelerates the computation speed of model selection. Numerical experiments show that our approach selects the model accurately and computationally efficiently, highlighting the importance of model selection in the spatial regression context. Then, the present approach is applied to open data to investigate local factors affecting crime in Japan. The results suggest that our approach is useful not only for selecting factors influencing crime risk but also for predicting crime events. This scalable model selection will be key to appropriately specifying flexible and large-scale spatial regression models in the era of big data. The developed model selection approach was implemented in the R package spmoran.