云对象商店上使用近数据计算加速传输学习

论文标题

云对象商店上使用近数据计算加速传输学习

Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

论文作者

Petrescu, Diana, Guirguis, Arsany, Quoc, Do Le, Picorel, Javier, Guerraoui, Rachid, Dinu, Florin

论文摘要

存储分解为当今云的基础，并通过将一些计算推向存储而自然补充，从而减轻存储和计算层之间的潜在网络瓶颈。我们通过专注于转移学习（TL）来展示ML培训如何从存储下降中受益，这是一种通过重复现有有关相关任务的知识来使ML民主化的广泛技术。我们提出了HAPI，这是一种新的TL处理系统，围绕两种互补技术，这些技术应对分类引入的挑战。首先，应用程序必须仔细平衡各个层次的执行，以进行性能。 Hapi在功能提取阶段明智地分配了TL计算，从而产生了推动力，不仅可以改善网络时间，而且还可以通过重叠跨层的连续训练迭代的执行来改善总TL训练时间。其次，操作员希望从存储端计算资源中资源效率。 HAPI采用存储侧批量适应，可增加存储端的下降并发性，而不会影响训练精度。 HAPI在86.8％的案例中选择最佳的分配点，或者最多可享受最佳折扣，而HAPI的训练速度高达2.5倍。

Storage disaggregation underlies today's cloud and is naturally complemented by pushing down some computation to storage, thus mitigating the potential network bottleneck between the storage and compute tiers. We show how ML training benefits from storage pushdowns by focusing on transfer learning (TL), the widespread technique that democratizes ML by reusing existing knowledge on related tasks. We propose HAPI, a new TL processing system centered around two complementary techniques that address challenges introduced by disaggregation. First, applications must carefully balance execution across tiers for performance. HAPI judiciously splits the TL computation during the feature extraction phase yielding pushdowns that not only improve network time but also improve total TL training time by overlapping the execution of consecutive training iterations across tiers. Second, operators want resource efficiency from the storage-side computational resources. HAPI employs storage-side batch size adaptation allowing increased storage-side pushdown concurrency without affecting training accuracy. HAPI yields up to 2.5x training speed-up while choosing in 86.8% of cases the best performing split point or one that is at most 5% off from the best.

下载PDF全文

下载文献需遵守相关版权规定

论文标题