Bit-Graphblas：GPU上以矩阵为中心的图形处理的位级优化

论文标题

Bit-Graphblas：GPU上以矩阵为中心的图形处理的位级优化

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

论文作者

Chen, Jou-An, Sung, Hsin-Hsuan, Shen, Xipeng, Tallent, Nathan, Barker, Kevin, Li, Ang

论文摘要

在像邻接矩阵这样的一般图形数据结构中，当边缘均匀时，可以使用单个位充分表示两个节点的连接性。但是，现有以矩阵的图形处理框架尚未充分利用这种见解。这项工作通过系统地探索图形的比特级表示以及对图形操作的相应优化来填补空隙。它提出了一个两级表示，称为BIT-BLOCK压缩稀疏行（B2SR），并通过利用现代GPU的内在信息来对B2SR上的图形操作进行一系列优化。对NVIDIA PASCAL和VOLTA GPU的评估表明，优化的价格高达$ 40 \ times $和$ 6555 \ $ $ $ $ $ \ times $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $，$ $ $ $，$ $，分别使基于Graphblas的BFS加速，$ 433 \ times $，SSSP，SSSP，pr和$ 35 $ $ 35 \ $ 35 \ $ 35 \ $ 35 \ $ 35 \ $ $ 35 \ $ 35 \ $ $ 35 \

In a general graph data structure like an adjacency matrix, when edges are homogeneous, the connectivity of two nodes can be sufficiently represented using a single bit. This insight has, however, not yet been adequately exploited by the existing matrix-centric graph processing frameworks. This work fills the void by systematically exploring the bit-level representation of graphs and the corresponding optimizations to the graph operations. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by leveraging the intrinsics of modern GPUs. Evaluations on NVIDIA Pascal and Volta GPUs show that the optimizations bring up to $40\times$ and $6555\times$ for essential GraphBLAS kernels SpMV and SpGEMM, respectively, making GraphBLAS-based BFS accelerate up to $433\times$, SSSP, PR, and CC up to $35\times$, and TC up to $52\times$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题