论文标题

算法1019:一种基于任务的多移QR/QZ算法,具有激进的早期通货紧缩

Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

论文作者

Myllykoski, Mirko

论文摘要

QR算法是计算特征值和密度非对称矩阵的特征向量的三个阶段之一。本文介绍了一种基于任务的QR算法,用于将上部的Hessenberg矩阵减少到真实的Schur形式。基于任务的算法还支持广义特征值问题(QZ算法),但本文集中在标准案例上。基于任务的算法采用了以前的算法改进,例如紧密耦合的多移和激进的早期放气(AED),还结合了几种可显着提高性能的新想法。这包括但不限于消除几个同步点,以前独立的计算步骤的动态合并,临界路径的缩短和优先级以及实验性GPU支持。在基于Intel和AMD CPU的两台不同机器上,基于任务的实现比多线和Scalapack的多次多次相比多次多次。该实现建立在Starpu运行时系统之上,是开源Starneig库的一部分。

The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源