论文标题
部分可观测时空混沌系统的无模型预测
DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization
论文作者
论文摘要
我们提出DFT-FE 1.0,在DFT-FE上建立0.6 [Comput。物理。社区。 246,106853(2020)],用于对多核CPU和混合CPU-GPU计算体系结构进行快速准确的大规模密度函数理论(DFT)计算(达到〜100,000美元的电子)。这项工作涉及改进实际空间公式 - 通过改进对静电相互作用的治疗,从而大大提高了计算效率 - 以及高性能计算方面,包括DFT-FE中所有关键计算内核的GPU加速度。我们通过比较广泛的基准系统上的基础能量,离子力和细胞应力与从广泛使用的DFT代码获得的基础来证明了准确性。此外,我们证明了实施的数值效率,该效率通过在混合CPU-GPU节点上使用GPU加速度来产生$ \ sim 20 \ times $ cpu-gpu加速。值得注意的是,由于GPU实施的平行规模,我们在包含〜6,000-15,000美元电子的基准系统上获得了80-140美元的$ 80-140 $秒钟的$ 80-140 $秒。
We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ $100,000$ electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency -- as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our implementation, which yields $\sim 20 \times$ CPU-GPU speed-up by using GPU acceleration on hybrid CPU-GPU nodes. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of $80-140$ seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~ $6,000-15,000$ electrons.