论文标题
平行最小跨越树算法和评估
Parallel Minimum Spanning Tree Algorithms and Evaluation
论文作者
论文摘要
最小生成树(MST)是一种重要的图形算法,在计算机网络,VLSI路由,无线通信等领域具有广泛的应用程序。如今,实际上,每台计算机都是由多核处理器构建的。因此,重要的是通过将现有算法和应用并行利用这种并行的计算能力。与PRAM模型的算法有关的大多数与同行MST的早期工作。此类研究有两个局限性。首先,婴儿车模型假设无限记忆带宽是不现实的。其次,基于下车模型的算法至少需要O(n)的处理器,其中n是总顶点的数量。对于大图,这是不可行的。针对真实系统的实现很少。在本文中,我介绍并评估了两种新的平行MST算法,这些算法是平行的Boruvka算法的变体:i)第一个算法使用没有旋转锁的锁定变量ii)第二算法仅使用原子比较比较和swap(CAS)。我评估了这些算法在六核,12个线程Intel系统上的性能,该系统的各种输入图最高100万个顶点。第一个算法在未优化的顺序算法上显示出高达1.94的速度,在优化的顺序算法上,速度最高为1.4。第二算法在未优化的顺序算法上显示高达2.03的加速度,在优化的顺序算法上,速度最高为1.403。当将第二种算法与第一个算法进行比较时,发现第二算法的第二算法比四个线程的第一个算法高出1.15倍。
Minimum Spanning Tree (MST) is an important graph algorithm that has wide ranging applications in the areas of computer networks, VLSI routing, wireless communications among others. Today virtually every computer is built out of multi-core processors. Hence it is important to take advantage of such parallel computing power by parallelizing existing algorithms and applications. Most of the earlier work on parallelizing MST focused on algorithms for PRAM models. There are two limitations to such studies. First, PRAM models assume infinite memory bandwidth which is unrealistic. Second, PRAM model based algorithms require at least O(n) processors where n being total number of vertices. For large graphs this is infeasible. There are very few implementations which target real systems. In this paper I present and evaluate two new parallel MST algorithms that are a variant of Parallel Boruvka algorithms: i) First algorithm uses lock variables without spin-locks ii) Second algorithm uses only atomic compare-and-swap (CAS) primitive. I evaluated the performance of these algorithms on a six-core, 12-thread Intel system on various input graphs of sizes up to 1 million vertices. First algorithm showed a speedup of up to1.94 over an un-optimized sequential algorithm and a speedup of up to 1.4 over an optimized sequential algorithm. Second algorithm showed a speedup of up to 2.03 over an un-optimized sequential algorithm and a speedup of up to 1.403 over an optimized sequential algorithm. When second algorithm using CAS is compared with the first algorithm second algorithm is found to be up to 1.15 times better than the first algorithm at four threads.