论文标题
Bit-Graphblas:GPU上以矩阵为中心的图形处理的位级优化
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
论文作者
论文摘要
在像邻接矩阵这样的一般图形数据结构中,当边缘均匀时,可以使用单个位充分表示两个节点的连接性。但是,现有以矩阵的图形处理框架尚未充分利用这种见解。这项工作通过系统地探索图形的比特级表示以及对图形操作的相应优化来填补空隙。它提出了一个两级表示,称为BIT-BLOCK压缩稀疏行(B2SR),并通过利用现代GPU的内在信息来对B2SR上的图形操作进行一系列优化。对NVIDIA PASCAL和VOLTA GPU的评估表明,优化的价格高达$ 40 \ times $和$ 6555 \ $ $ $ $ $ \ times $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $,$ $ $ $,$ $,分别使基于Graphblas的BFS加速,$ 433 \ times $,SSSP,SSSP,pr和$ 35 $ $ 35 \ $ 35 \ $ 35 \ $ 35 \ $ 35 \ $ $ 35 \ $ 35 \ $ $ 35 \
In a general graph data structure like an adjacency matrix, when edges are homogeneous, the connectivity of two nodes can be sufficiently represented using a single bit. This insight has, however, not yet been adequately exploited by the existing matrix-centric graph processing frameworks. This work fills the void by systematically exploring the bit-level representation of graphs and the corresponding optimizations to the graph operations. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by leveraging the intrinsics of modern GPUs. Evaluations on NVIDIA Pascal and Volta GPUs show that the optimizations bring up to $40\times$ and $6555\times$ for essential GraphBLAS kernels SpMV and SpGEMM, respectively, making GraphBLAS-based BFS accelerate up to $433\times$, SSSP, PR, and CC up to $35\times$, and TC up to $52\times$.