使用自回归模型和LSTM的COVID-19病例的可解释的混合预测模型

论文标题

使用自回归模型和LSTM的COVID-19病例的可解释的混合预测模型

An Interpretable Hybrid Predictive Model of COVID-19 Cases using Autoregressive Model and LSTM

论文作者

Zhang, Yangyi, Tang, Sui, Yu, Guo

论文摘要

2019年冠状病毒疾病（Covid-19）对全球健康和经济产生了深远的影响，这对于为COVID-19案件建立准确，可解释的数据驱动的预测模型至关重要，以改善政策制定。大规模的大流行和内在变化的传播特性对有效的Covid-19案例预测构成了巨大的挑战。为了应对这一挑战，我们提出了一种新型的混合模型，其中自回归模型（AR）的可解释性和长期短期记忆神经网络（LSTM）联合力的预测能力。所提出的混合模型被形式化为一个神经网络，其结构连接两个组合模型块，其中相对贡献在训练过程中可以自适应地确定。我们通过对多个评估指标下的两个数据源进行全面的数值研究，证明了混合模型与其两个组件模型以及其他流行的预测模型相比的良好性能。具体而言，在8个加利福尼亚县的县级数据中，我们的混合模型平均可实现4.173％的MAPE，表现优于组成的AR（5.629％）和LSTM（4.934％）。在国家 /地区的数据集中，我们的混合模型在预测全球8个国家 /地区的COVID-19案件方面优于广泛使用的预测模型-AR，LSTM，SVM，梯度增强和随机森林。此外，我们说明了我们提出的混合模型的解释性，这是大多数Black-box预测模型共享的关键特征。我们的研究为建立有效且可解释的数据驱动模型提供了一个新的且有希望的方向，这可能会对公共卫生政策制定以及对当前和潜在的未来大流行的控制产生重大影响。

The Coronavirus Disease 2019 (COVID-19) has a profound impact on global health and economy, making it crucial to build accurate and interpretable data-driven predictive models for COVID-19 cases to improve policy making. The extremely large scale of the pandemic and the intrinsically changing transmission characteristics pose great challenges for effective COVID-19 case prediction. To address this challenge, we propose a novel hybrid model in which the interpretability of the Autoregressive model (AR) and the predictive power of the long short-term memory neural networks (LSTM) join forces. The proposed hybrid model is formalized as a neural network with an architecture that connects two composing model blocks, of which the relative contribution is decided data-adaptively in the training procedure. We demonstrate the favorable performance of the hybrid model over its two component models as well as other popular predictive models through comprehensive numerical studies on two data sources under multiple evaluation metrics. Specifically, in county-level data of 8 California counties, our hybrid model achieves 4.173% MAPE on average, outperforming the composing AR (5.629%) and LSTM (4.934%). In country-level datasets, our hybrid model outperforms the widely-used predictive models - AR, LSTM, SVM, Gradient Boosting, and Random Forest - in predicting COVID-19 cases in 8 countries around the world. In addition, we illustrate the interpretability of our proposed hybrid model, a key feature not shared by most black-box predictive models for COVID-19 cases. Our study provides a new and promising direction for building effective and interpretable data-driven models, which could have significant implications for public health policy making and control of the current and potential future pandemics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题