大数据时代的事件预测：系统调查

论文标题

大数据时代的事件预测：系统调查

Event Prediction in the Big Data Era: A Systematic Survey

论文作者

Zhao, Liang

论文摘要

事件是在特定位置，时间和语义中发生的事件，这些位置，时间和语义是非琐事影响我们的社会或自然的，例如内乱，系统失败和流行病。非常希望能够预期事先发生此类事件，以减少潜在的社会动荡和损害。事件预测传统上一直是充满挑战的事件预测，现在正成为大数据时代的可行选择，因此正在经历快速增长。有大量现有工作重点是应对所涉及的挑战，包括异质的多方面输出，复杂的依赖关系和流数据提要。大多数现有的事件预测方法最初都是为处理特定应用程序域而设计的，尽管所使用的技术和评估程序通常在不同域中可以推广。但是，考虑到事件预测的全面文献调查，跨不同领域的技术必须跨不同领域的技术进行交叉引用。本文旨在对大数据时代事件预测的技术，应用和评估进行系统，全面的调查。首先，提出了系统的分类和现有技术的摘要，这有助于域专家寻找合适的技术，并帮助模型开发人员合并其在边境的研究。然后，提供了主要应用领域的全面分类和摘要。总结和标准化评估指标和程序，以统一对各种应用领域中利益相关者，模型开发人员和域专家之间对模型绩效的理解。最后，对这个有希望和重要领域的开放问题和未来的方向进行了阐明和讨论。

Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as civil unrest, system failures, and epidemics. It is highly desirable to be able to anticipate the occurrence of such events in advance in order to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex dependencies, and streaming data feeds. Most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts' searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions for this promising and important domain are elucidated and discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题