Tensoropt：探索与自动并行的分布式DNN培训中的权衡

论文标题

Tensoropt：探索与自动并行的分布式DNN培训中的权衡

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

论文作者

Cai, Zhenkun, Ma, Kaihao, Yan, Xiao, Wu, Yidi, Huang, Yuzhen, Cheng, James, Su, Teng, Yu, Fan

论文摘要

良好的并行化策略可以显着提高效率或降低深神经网络（DNNS）的分布式培训的成本。最近，已经提出了几种方法来找到有效的并行化策略，但它们都优化了一个目标（例如，执行时间，内存消耗）并仅产生一种策略。我们提出了FT，这是一种有效的算法，该算法搜索一组最佳的并行化策略，以允许不同目标之间的权衡。当设备数量有限并充分利用其他资源以减少执行时间时，FT可以通过最大程度地减少内存消耗来适应不同的方案。对于流行的DNN模型（例如，视觉，语言），进行了深入的分析，以了解不同目标之间的权衡及其对并行化策略的影响。我们还开发了一个名为Tensoropt的用户友好系统，该系统允许用户在不关心并行化策略的细节的情况下运行其分布式DNN培训工作。实验结果表明，FT可以有效地运行并提供了准确的运行时成本估算，并且与现有框架相比，Tensoropt在适应资源可用性方面更加灵活。

A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs). Recently, several methods have been proposed to find efficient parallelization strategies but they all optimize a single objective (e.g., execution time, memory consumption) and produce only one strategy. We propose FT, an efficient algorithm that searches for an optimal set of parallelization strategies to allow the trade-off among different objectives. FT can adapt to different scenarios by minimizing the memory consumption when the number of devices is limited and fully utilize additional resources to reduce the execution time. For popular DNN models (e.g., vision, language), an in-depth analysis is conducted to understand the trade-offs among different objectives and their influence on the parallelization strategies. We also develop a user-friendly system, called TensorOpt, which allows users to run their distributed DNN training jobs without caring the details of parallelization strategies. Experimental results show that FT runs efficiently and provides accurate estimation of runtime costs, and TensorOpt is more flexible in adapting to resource availability compared with existing frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题