论文标题

关于图形处理单元簇的交换相关潜力的有效评估

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

论文作者

Williams-Young, David B., de Jong, Wibe A., van Dam, Hubertus J. J., Yang, Chao

论文摘要

Kohn-SHAM密度功能理论(KS-DFT)的主导性用于分子化学和材料科学中大型实验相关系统的理论处理,主要依赖于有效的软件实施,这些软件实现能够利用现代高性能计算(HPC)的最新进展。随着HPC的最新趋势导致了增加基于图形处理单元(GPU)等异质加速器建筑的依赖,因此现有代码库必须采用这些架构的进步,以维持这些方法的高级性能。在这项工作中,我们目的是在高斯基础基础基础设置KOHN-SHAM方程的大型计算簇上离散化交换 - 相关(XC)潜力的分布式数值集成(XC)潜力的分布式数值集成在这项工作中。此外,我们目的并证明了使用批处理内核的功效,包括批处理3级Blas操作,以实现GPU上的高级性能。我们通过比较了NWCHEM中现有的可伸缩CPU XC集成在NWCHEMEX软件包中在NWCHEMEX软件包中实现的性能和可扩展性。

The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源