论文标题

自适应自我安排循环调度程序

An Adaptive Self-Scheduling Loop Scheduler

论文作者

Booth, Joshua Dennis, Lane, Phillip

论文摘要

许多共享的记忆并行不规则应用,例如稀疏线性代数和图形算法,以叉子加入方式取决于有效的循环调度(LS),尽管每个循环迭代的工作可能会取决于应用程序和输入。由于其重要性,已经探索了许多不同的方法,例如工作量意识到的自我安排和参数,例如块大小,以实现合理的绩效,需要专家的先验知识,以了解应用程序和输入。这项工作提出了一种新的LS方法,该方法几乎不需要专家知识,即通过基于工作负载差异的启发式和使用工作窃取的自我管理的块大小来实现接近调谐LS方法的加速。该方法命名为\ iChunk,用于用于测试的libgomp中。它是针对OpenMP的指导,动态和Taskloop方法进行评估的,并在一系列应用程序上对BINLPT和通用的工作进行评估,其中包括:合成基准,广度范围搜索,K-Means,分子动力学代码LAVAMD和稀疏的Matrix-vector-vector-vector-vector-vector乘坐。在28线程Intel系统上,\ iChunk是始终是前三种LS方法之一的唯一方法。在所有应用程序中,平均而言,\ ichunk在最佳方法的5.4%之内,甚至能够超过其他LS方法来进行广度优先搜索和K-均值。

Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of its importance, many different methods, e.g., workload-aware self-scheduling, and parameters, e.g., chunk size, have been explored to achieve reasonable performance that requires expert prior knowledge about the application and input. This work proposes a new LS method that requires little to no expert knowledge to achieve speedups close to those of tuned LS methods by self-managing chunk size based on a heuristic of workload variance and using work-stealing. This method, named \ichunk, is implemented into libgomp for testing. It is evaluated against OpenMP's guided, dynamic, and taskloop methods and is evaluated against BinLPT and generic work-stealing on an array of applications that includes: a synthetic benchmark, breadth-first search, K-Means, the molecular dynamics code LavaMD, and sparse matrix-vector multiplication. On 28 thread Intel system, \ichunk is the only method to always be one of the top three LS methods. On average across all applications, \ichunk is within 5.4% of the best method and is even able to outperform other LS methods for breadth-first search and K-Means.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源