论文标题

GraphPi:高性能图模式通过有效的冗余消除匹配

GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination

论文作者

Shi, Tianhui, Zhai, Mingshu, Xu, Yi, Zhai, Jidong

论文摘要

旨在在图中发现结构模式的图形模式匹配被认为是许多实际应用中最基本的图挖掘问题之一。尽管以前的努力,现有系统面临两个主要挑战。首先,模式中存在的固有对称性可以引入大量冗余计算。其次,模式的不同匹配顺序具有显着的性能差异,很难预测。当这些因素混合在一起时,这个问题就会变得非常复杂。高效的模式匹配仍然是目前的一个空旷的问题。为了应对这些挑战,我们提出了GraphPi,这是一个高性能分布式模式匹配系统。 GraphPi利用基于组理论中2个循环的新算法来生成多组不对称限制,其中每组可以完全消除冗余计算。我们进一步设计了一个准确的性能模型,以确定最佳匹配顺序和不对称限制设置,以进行有效的模式匹配。我们评估Tianhe-2a超级计算机上的GraphPi。结果表明,对于单个节点上的6个现实世界图数据集,GraphPi的表现优于先进的系统,最高可达105倍。我们还将GraphPi扩展到1,024个计算节点(24,576个核心)。

Graph pattern matching, which aims to discover structural patterns in graphs, is considered one of the most fundamental graph mining problems in many real applications. Despite previous efforts, existing systems face two main challenges. First, inherent symmetry existing in patterns can introduce a large amount of redundant computation. Second, different matching orders for a pattern have significant performance differences and are quite hard to predict. When these factors are mixed, this problem becomes extremely complicated. High efficient pattern matching remains an open problem currently. To address these challenges, we propose GraphPi, a high performance distributed pattern matching system. GraphPi utilizes a new algorithm based on 2-cycles in group theory to generate multiple sets of asymmetric restrictions, where each set can eliminate redundant computation completely. We further design an accurate performance model to determine the optimal matching order and asymmetric restriction set for efficient pattern matching. We evaluate GraphPi on Tianhe-2A supercomputer. Results show that GraphPi outperforms the state-ofthe-art system, by up to 105X for 6 real-world graph datasets on a single node. We also scale GraphPi to 1,024 computing nodes (24,576 cores).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源