论文标题
提高无基质本素层的可伸缩性,以研究多体定位
Enhancing Scalability of a Matrix-Free Eigensolver for Studying Many-Body Localization
论文作者
论文摘要
在[van Beeumen等。 al, HPC Asia 2020, https://www.doi.org/10.1145/3368474.3368497] a scalable and matrix-free eigensolver was proposed for studying the many-body localization (MBL) transition of two-level quantum spin chain models with nearest-neighbor $XX+YY$ interactions plus $Z$ terms.这种类型的问题在计算上具有挑战性,因为矢量空间维度随物理系统的大小呈指数增长,并且需要在随机疾病的不同配置上平均以获得相关的统计行为。对于每个特征值问题,需要计算来自光谱不同区域及其相应特征向量的特征值。传统上,单个特征值问题的内部本征态是通过移动和逆向兰开斯算法计算的。由于LU因素化的内存足迹极高,因此该技术不适合大量旋转$ L $,例如,人们需要数千个现代高性能计算基础设施的计算节点超过$ L = 24 $。无基质方法不会遭受这种内存瓶颈的困扰,但是,其可扩展性受到计算和通信失衡的限制。我们提出了一些策略,以减少这种失衡并显着提高无基质eigensolver的可伸缩性。为了优化通信性能,我们利用一致的空间运行时,CSPACER,并显示其在加速MBL不规则通信模式方面的效率,与优化的MPI非块双面和单面RMA实现相比。通过在大量平行的多核高性能计算机上计算特征状态来证明所提出算法的效率和有效性。
In [Van Beeumen, et. al, HPC Asia 2020, https://www.doi.org/10.1145/3368474.3368497] a scalable and matrix-free eigensolver was proposed for studying the many-body localization (MBL) transition of two-level quantum spin chain models with nearest-neighbor $XX+YY$ interactions plus $Z$ terms. This type of problem is computationally challenging because the vector space dimension grows exponentially with the physical system size, and averaging over different configurations of the random disorder is needed to obtain relevant statistical behavior. For each eigenvalue problem, eigenvalues from different regions of the spectrum and their corresponding eigenvectors need to be computed. Traditionally, the interior eigenstates for a single eigenvalue problem are computed via the shift-and-invert Lanczos algorithm. Due to the extremely high memory footprint of the LU factorizations, this technique is not well suited for large number of spins $L$, e.g., one needs thousands of compute nodes on modern high performance computing infrastructures to go beyond $L = 24$. The matrix-free approach does not suffer from this memory bottleneck, however, its scalability is limited by a computation and communication imbalance. We present a few strategies to reduce this imbalance and to significantly enhance the scalability of the matrix-free eigensolver. To optimize the communication performance, we leverage the consistent space runtime, CSPACER, and show its efficiency in accelerating the MBL irregular communication patterns at scale compared to optimized MPI non-blocking two-sided and one-sided RMA implementation variants. The efficiency and effectiveness of the proposed algorithm is demonstrated by computing eigenstates on a massively parallel many-core high performance computer.