论文标题

一种单树算法,用于计算GPU上的欧几里得最小跨越树

A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs

论文作者

Prokopenko, Andrey, Sao, Piyush, Lebrun-Grandié, Damien

论文摘要

计算欧几里得最小跨越树(EMST)是许多算法的计算要求步骤。尽管已知用于计算EMST的工作效率的串行和多线程算法,但由于复杂的分支结构,数据依赖性和负载不平衡,设计有效的GPU算法是具有挑战性的。在本文中,我们提出了一种基于单树borůvka的算法,用于计算GPU上的EMST。我们使用有效的最近的邻居算法,并通过避免在同一组件中使用叶子节点遍历子树来减少所需距离计算的数量。开发的算法是使用基于Kokkos框架的开源几何搜索库Arborx以性能便携式方式实现的。我们在各种2D和3D数据集上评估了所提出的算法,显示并将其与当前最新的开源CPU实现进行了比较。我们在最快的多线程实现上演示了4-24倍的加速。我们通过在各种硬件上提供结果来证明实施的可移植性:AMD EPYC 7763,NVIDIA A100和AMD MI250X。我们显示了实施的可扩展性,在单个A100 NVIDIA GPU上,计算3700万3D宇宙学数据集的EMST。

Computing the Euclidean minimum spanning tree (EMST) is a computationally demanding step of many algorithms. While work-efficient serial and multithreaded algorithms for computing EMST are known, designing an efficient GPU algorithm is challenging due to a complex branching structure, data dependencies, and load imbalances. In this paper, we propose a single-tree Borůvka-based algorithm for computing EMST on GPUs. We use an efficient nearest neighbor algorithm and reduce the number of the required distance calculations by avoiding traversing subtrees with leaf nodes in the same component. The developed algorithms are implemented in a performance portable way using ArborX, an open-source geometric search library based on the Kokkos framework. We evaluate the proposed algorithm on various 2D and 3D datasets, show and compare it with the current state-of-the-art open-source CPU implementations. We demonstrate 4-24x speedup over the fastest multi-threaded implementation. We prove the portability of our implementation by providing results on a variety of hardware: AMD EPYC 7763, Nvidia A100 and AMD MI250X. We show scalability of the implementation, computing EMST for 37 million 3D cosmological dataset in under a 0.5 second on a single A100 Nvidia GPU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源