论文标题

rmlstreamer-siso:来自流的异质数据的RDF流生成器

RMLStreamer-SISO: an RDF stream generator from streaming heterogeneous data

论文作者

Oo, Sitt Min, Haesendonck, Gerald, De Meester, Ben, Dimou, Anastasia

论文摘要

流曲线查询语言(例如CQELS和C-SPARQL)在RDF流上启用查询。不幸的是,目前缺乏有效的RDF流发电机来供电RDF流推理器。关于可以处理的流媒体数据的速度和量,最先进的RDF流生成器受到限制。为了以可扩展的方式有效地生成RDF流,我们扩展了RMLStreamer,还从动态异质数据流中生成RDF流。本文介绍了一种可扩展的解决方案,该解决方案依赖于动态窗口方法,以生成来自延迟低的RDF流,并且来自多个异质数据流的高吞吐量。我们的评估表明,我们的解决方案通过实现毫秒延迟(与最先进的解决方案所需的秒数相比),所有工作负载的持续记忆使用量以及约70,000纪录/s的可持续吞吐量(与10,000张/s相比,与10,000条记录相比,与10,000条记录/S相比,与最新的解决方案相比,持续的记忆使用量相比,我们的解决方案都优于最先进的延迟。这为与语义Web集成的众多数据流提供了访问。

Stream-reasoning query languages such as CQELS and C-SPARQL enable query answering over RDF streams. Unfortunately, there currently is a lack of efficient RDF stream generators to feed RDF stream reasoners. State-of-the-art RDF stream generators are limited with regard to the velocity and volume of streaming data they can handle. To efficiently generate RDF streams in a scalable way, we extended the RMLStreamer to also generate RDF streams from dynamic heterogeneous data streams. This paper introduces a scalable solution that relies on a dynamic window approach to generate RDF streams with low latency and high throughput from multiple heterogeneous data streams. Our evaluation shows that our solution outperforms the state-of-the-art by achieving millisecond latency (compared to seconds that state-of-the-art solutions need), constant memory usage for all workloads, and sustainable throughput of around 70,000 records/s (compared to 10,000 records/s that state-of-the-art solutions take). This opens up the access to numerous data streams for integration with the semantic web.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源