论文标题
许多顺序迭代算法可以平行,并且(几乎)工作效率
Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient
论文作者
论文摘要
为了设计有效的并行算法,最近的一些论文表明,许多顺序迭代算法可以直接平行,但在实现工作效率和高平行性方面仍然存在挑战。对于某些问题,依赖数量渐近远远超过最佳顺序工作的某些问题可能很难。为了实现高平行性,我们希望并行处理尽可能多的对象。目标是实现$ \ tilde {o}(d)$ span,以解决最深的依赖长长度$ d $的问题。我们将此属性称为循环效率。在本文中,我们显示了各种经典问题的工作效率和圆形算法,并提出了这样做的一般方法。 为了有效地平行许多顺序迭代算法,我们提出了相位平行的框架。该框架将等级分配给每个对象并相应地处理。所有具有相同等级的对象都可以并行处理。为了实现工作效率和高平行性,我们使用两种类型的一般技术。类型1算法旨在使用范围查询来提取具有相同等级的所有对象,从而避免评估所有依赖性。我们讨论活动选择,无限的背包以及使用类型1框架的更多内容。类型2算法旨在唤醒对象,当它依赖的最后一个对象完成。我们讨论了使用2型框架的活动选择,最长增加的子序列(LIS)和许多其他算法。 我们所有的算法都是(几乎)工作效率和圆形的。他们中的许多人都提高了以前的最佳界限,其中一些是第一个以圆价实现工作效率的人。我们还实施了许多。在具有合理依赖性深度的输入上,我们的算法高度平行,并且明显优于其顺序对应物。
To design efficient parallel algorithms, some recent papers showed that many sequential iterative algorithms can be directly parallelized but there are still challenges in achieving work-efficiency and high-parallelism. Work-efficiency can be hard for certain problems where the number of dependences is asymptotically more than optimal sequential work bound. To achieve high-parallelism, we want to process as many objects as possible in parallel. The goal is to achieve $\tilde{O}(D)$ span for a problem with the deepest dependence length $D$. We refer to this property as round-efficiency. In this paper, we show work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes them accordingly. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank, such that we avoid evaluating all the dependences. We discuss activity selection, unlimited knapsack, and more using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient. Many of them improve previous best bounds, and some of them are the first to achieve work-efficiency with round-efficiency. We also implement many of them. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.