在GPU上使用半精度算术加速几何多移民预处理

论文标题

在GPU上使用半精度算术加速几何多移民预处理

Accelerating Geometric Multigrid Preconditioning with Half-Precision Arithmetic on GPUs

论文作者

Oo, Kyaw L., Vogel, Andreas

论文摘要

通过对NVIDIA V100 GPU的半精度算术的硬件支持，高性能计算应用程序可以在适当的位置较低的精度中受益，以加快整体执行时间。在本文中，我们研究了一种混合精确的几何多物种方法，以解决由椭圆PDES离散化的大型稀疏方程式。虽然最终解决方案始终以高精度精度计算，但采用较低精度和残基缩放的多机预处理的迭代改进方法。我们将poisson方程的FP64基线与纯FP16 Multigrid预处理以及在网状层次结构内使用FP16-FP32-FP64组合的使用。虽然迭代计数几乎不受使用较低的精度影响，但由于存储器传递的减少，求解器运行时大大降低，并且为整个求解器增长了高达2.5倍的速度。我们使用分层屋顶线模型研究了选定核的性能。

With the hardware support for half-precision arithmetic on NVIDIA V100 GPUs, high-performance computing applications can benefit from lower precision at appropriate spots to speed up the overall execution time. In this paper, we investigate a mixed-precision geometric multigrid method to solve large sparse systems of equations stemming from discretization of elliptic PDEs. While the final solution is always computed with high-precision accuracy, an iterative refinement approach with multigrid preconditioning in lower precision and residuum scaling is employed. We compare the FP64 baseline for Poisson's equation to purely FP16 multigrid preconditioning and to the employment of FP16-FP32-FP64 combinations within a mesh hierarchy. While the iteration count is almost not affected by using lower accuracy, the solver runtime is considerably decreased due to the reduced memory transfer and a speedup of up to 2.5x is gained for the overall solver. We investigate the performance of selected kernels with the hierarchical Roofline model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题