论文标题

弹性执行检查点的MPI应用程序

Elastic execution of checkpointed MPI applications

论文作者

Gajjar, Sumeet, Vaidya, Saurabh

论文摘要

MPI应用程序以固定数量的排名开始,默认情况下,整个应用程序的寿命中,等级保持恒定。开发人员可以选择通过动态产卵MPI流程来提高排名。但是,手动进行此操作为MPI应用程序增加了复杂性。使MPI应用程序可延展\ cite {b20}将允许HPC应用程序具有与云应用程序相同的弹性。我们提出了多种方法,以将MPI程序的排名不可思议的A不知所措为用户代码的修改。我们使用检查点作为工具,通过停止执行并以新状态恢复MPI程序来实现排名的可变性。在本文中,我们专注于使用EXAMPI作为MPI实施的MPI计划的等级的方案。

MPI applications begin with a fixed number of rank and, by default, the rank remains constant throughout the application's lifetime. The developer can choose to increase the rank by dynamically spawning MPI processes. However doing this manually adds complexity to the MPI application. Making the MPI applications malleable \cite{b20} would allow HPC applications to have the same elasticity as that of cloud applications. We propose multiple approaches to change the rank of an MPI program agnostic to the modification of the user code. We use checkpointing as a tool to achieve mutability of rank by halting the execution and resuming the MPI program with a new state. In this paper, we focus on the scenario of increasing the rank of an MPI program using ExaMPI as the implementation for MPI.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源