论文标题
使用时间序列和极端梯度提升合奏预测太阳耀斑
Solar Flares Forecasting Using Time Series and Extreme Gradient Boosting Ensembles
论文作者
论文摘要
太空天气事件可能会损害几个领域,包括航空,卫星,石油和天然气行业以及电气系统,导致经济和商业损失。太阳耀斑是最重要的事件之一,指的是突然的辐射释放,可能会在几个小时或几分钟内影响地球大气。因此,值得设计用于预测此类事件的高性能系统。尽管在文献中有许多耀斑预测的方法,但对于设计这些系统的技术,仍然缺乏共识。为了在设计耀斑预测变量时建立一些标准化,在这项研究中,我们提出了一种设计这种预测因子的新方法,并通过极端的梯度增强树分类器和时间序列进一步验证。该方法依赖于以下定义明确的基于机器学习的管道:(i)单变量特征选择; (ii)随机超参数优化; (iii)数据处理不平衡; (iv)调整分类器的截止点; (v)操作设置下的评估。为了验证我们的方法论有效性,我们设计和评估了三种概念验证模型,以预测$ \ geq c $类最多可提前72小时。与基线模型相比,这些模型能够显着提高其在操作预测方案下的真实技能统计得分(TSS)的得分,增加了0.37(预测接下来的24小时的耀斑),0.13(预测24-48小时内的耀斑)和0.36(预测0.36(预测48-72小时内的耀斑))。除了增加TSS外,该方法还导致了ROC曲线下该区域的显着增加,证实了我们改善了分类器的正召回和负召回,同时减少了错误警报的数量。
Space weather events may cause damage to several fields, including aviation, satellites, oil and gas industries, and electrical systems, leading to economic and commercial losses. Solar flares are one of the most significant events, and refer to sudden radiation releases that can affect the Earth's atmosphere within a few hours or minutes. Therefore, it is worth designing high-performance systems for forecasting such events. Although in the literature there are many approaches for flare forecasting, there is still a lack of consensus concerning the techniques used for designing these systems. Seeking to establish some standardization while designing flare predictors, in this study we propose a novel methodology for designing such predictors, further validated with extreme gradient boosting tree classifiers and time series. This methodology relies on the following well-defined machine learning based pipeline: (i) univariate feature selection; (ii) randomized hyper-parameter optimization; (iii) imbalanced data treatment; (iv) adjustment of cut-off point of classifiers; and (v) evaluation under operational settings. To verify our methodology effectiveness, we designed and evaluated three proof-of-concept models for forecasting $\geq C$ class flares up to 72 hours ahead. Compared to baseline models, those models were able to significantly increase their scores of true skill statistics (TSS) under operational forecasting scenarios by 0.37 (predicting flares in the next 24 hours), 0.13 (predicting flares within 24-48 hours), and 0.36 (predicting flares within 48-72 hours). Besides increasing TSS, the methodology also led to significant increases in the area under the ROC curve, corroborating that we improved the positive and negative recalls of classifiers while decreasing the number of false alarms.