论文标题
便携式高阶有限元核I:流动操作
Portable high-order finite element kernels I: Streaming Operations
论文作者
论文摘要
本文致力于开发在线性系统求解器中执行矢量操作的高效核的发展。特别是,我们专注于在共轭梯度迭代方法中执行的低算术强度操作(即流媒体操作),使用CEED基准问题中指定的高阶六面体有限元中指定的参数。我们提出了一套新的基准流测试套件,以专注于必须执行的不同流媒体操作。我们使用OCCA抽象框架实施了这些新测试,以在不同的GPU体系结构上证明这些流操作的可移植性,并为此类内核提出了一个简单的性能模型,该模型可以准确地捕获数据移动速率以及内核启动成本。
This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propose a suite of new Benchmark Streaming tests to focus on the distinct streaming operations which must be performed. We implemented these new tests using the OCCA abstraction framework to demonstrate portability of these streaming operations on different GPU architectures, and propose a simple performance model for such kernels which can accurately capture data movement rates as well as kernel launch costs.