论文标题
使用硬件辅助传播阻塞优化图形处理和预处理
Optimizing Graph Processing and Preprocessing with Hardware Assisted Propagation Blocking
论文作者
论文摘要
广泛的先前研究重点是减轻图形分析工作负载的特征性高速缓存位置。但是,图预处理任务仍然相对尚未探索。在许多重要方案中,图形预处理任务可能与下游图分析内核一样昂贵。我们观察到,为SPMV内核设计的软件优化传播阻滞(PB)概括为许多Graph Analytics内核以及常见的预处理任务。在这项工作中,我们确定了PB在常规多环上执行的效率低下,并提出了架构支持,以消除PB的瓶颈,从而进一步提高了PB的性能提高。我们提出的架构 - 眼镜蛇 - 优化了图形处理和预处理的PB执行,以提供高达4.6倍的端到端速度(平均为3.5倍)。
Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing tasks can be as expensive as the downstream graph analytics kernel. We observe that Propagation Blocking (PB), a software optimization designed for SpMV kernels, generalizes to many graph analytics kernels as well as common pre-processing tasks. In this work, we identify the lingering inefficiencies of a PB execution on conventional multicores and propose architecture support to eliminate PB's bottlenecks, further improving the performance gains from PB. Our proposed architecture -- COBRA -- optimizes the PB execution of both graph processing and pre-processing alike to provide end-to-end speedups of up to 4.6x (3.5x on average).