论文标题

Papya:大型RDF图处理的性能分析变得容易

PAPyA: Performance Analysis of Large RDF Graphs Processing Made Easy

论文作者

Ragab, Mohamed, Adidarma, Adam Satria, Tommasini, Riccardo

论文摘要

证明性绩效分析(PPA)已显示出比传统的描述性和诊断分析更有用,以了解大数据(BD)框架的性能。在实践中,当在关系BD系统之上处理大型(RDF)图表时,出现了一些设计决策,并且无法自动确定,例如,架构的选择,分区技术和存储格式。 PPA,特别是排名功能,有助于对性能数据进行可行的见解,使从业者更容易地选择部署BD框架的最佳方法,尤其是用于图形处理。但是,实施PPA所需的实验工作数量仍然很大。在本文中,我们提出Papya 1,这是一个用于实施PPA的库,该库允许(1)准备RDF图形数据,以使处理管道在关系BD系统上,(2)在实验尺寸的用户定义的解决方案空间中启用性能自动排名; (3)允许用户定义的灵活扩展在系统方面进行测试和排名方法。我们根据SparkSQL框架在一组实验中展示Papya。 Papya简化了用于处理大型(RDF)图的BD系统的性能分析。我们根据MIT许可提供Papya作为公共开源库,这将是设计新研究的BD应用中的新研究规定分析技术的催化剂。

Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks' performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPyA 1, a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPyA on a set of experiments based on the SparkSQL framework. PAPyA simplifies the performance analytics of BD systems for processing large (RDF) graphs.We provide PAPyA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源