部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization

论文作者

Das, Sambit, Motamarri, Phani, Subramanian, Vishal, Rogers, David M., Gavini, Vikram

论文摘要

我们提出DFT-FE 1.0，在DFT-FE上建立0.6 [Comput。物理。社区。 246，106853（2020）]，用于对多核CPU和混合CPU-GPU计算体系结构进行快速准确的大规模密度函数理论（DFT）计算（达到〜100,000美元的电子）。这项工作涉及改进实际空间公式 - 通过改进对静电相互作用的治疗，从而大大提高了计算效率 - 以及高性能计算方面，包括DFT-FE中所有关键计算内核的GPU加速度。我们通过比较广泛的基准系统上的基础能量，离子力和细胞应力与从广泛使用的DFT代码获得的基础来证明了准确性。此外，我们证明了实施的数值效率，该效率通过在混合CPU-GPU节点上使用GPU加速度来产生$ \ sim 20 \ times $ cpu-gpu加速。值得注意的是，由于GPU实施的平行规模，我们在包含〜6,000-15,000美元电子的基准系统上获得了80-140美元的$ 80-140 $秒钟的$ 80-140 $秒。

We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ $100,000$ electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency -- as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our implementation, which yields $\sim 20 \times$ CPU-GPU speed-up by using GPU acceleration on hybrid CPU-GPU nodes. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of $80-140$ seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~ $6,000-15,000$ electrons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题