论文标题

群:在分布式流媒体系统中用于大空间数据的自适应负载平衡

SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data

论文作者

Daghistani, Anas, Aref, Walid G., Ghafoor, Arif, Mahmood, Ahmed R.

论文摘要

支持GPS的设备的扩散导致了许多基于位置的服务的开发。这些服务需要实时处理大量空间数据。当前的空间数据尺度无法使用集中式系统来处理。这导致了分布式空间流系统的发展。现有系统正在使用静态空间分区来分配工作负载。相比之下,实时流的空间数据遵循随着时间的流逝不断变化的非均匀空间分布。分布式空间流系统需要对空间数据和查询分布的变化做出反应。本文介绍了一项轻量重量适应性协议,该协议可以在空间数据流系统的分布式过程中连续监视数据和查询工作负载,并在检测到性能瓶颈后立即重新分配和重新平衡工作负载。 Swarm能够处理多个查询执行和数据持久性模型。分布式流媒体系统可以直接使用Swarm来适应该系统在其机器之间的工作量,并且对基础空间应用程序的原始代码的更改很小。使用真实和合成数据集进行广泛的实验评估表明,平均而言,基于观察数据和查询工作负载的有限历史的静态网格分配,群平均可以提高200%。此外,与其他技术相比,Swarm平均减少了4倍的执行延迟。

The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of spatial data in real-time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. Existing systems are using static spatial partitioning to distribute the workload. In contrast, the real-time streamed spatial data follows non-uniform spatial distributions that are continuously changing over time. Distributed spatial streaming systems need to react to the changes in the distribution of spatial data and queries. This paper introduces SWARM, a light-weight adaptivity protocol that continuously monitors the data and query workloads across the distributed processes of the spatial data streaming system, and redistribute and rebalance the workloads soon as performance bottlenecks get detected. SWARM is able to handle multiple query-execution and data-persistence models. A distributed streaming system can directly use SWARM to adaptively rebalance the system's workload among its machines with minimal changes to the original code of the underlying spatial application. Extensive experimental evaluation using real and synthetic datasets illustrate that, on average, SWARM achieves 200% improvement over a static grid partitioning that is determined based on observing a limited history of the data and query workloads. Moreover, SWARM reduces execution latency on average 4x compared with the other technique.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源