论文标题

数字:云的可扩展数组编程

NumS: Scalable Array Programming for the Cloud

论文作者

Elibol, Melih, Benara, Vinamra, Yagati, Samyu, Zheng, Lianmin, Cheung, Alvin, Jordan, Michael I., Stoica, Ion

论文摘要

科学家越来越依靠Python工具使用丰富的,类似于Numpy的表达式执行可扩展的分布式内存阵列操作。但是,这些工具中的许多工具都依赖于针对抽象任务图进行了优化的动态调度程序,这些计划图通常遇到内存和网络带宽相关的瓶颈,这是由于亚最佳数据和操作员的位置决策。在消息传递接口(MPI)(例如Scalapack和Slate)上构建的工具具有更好的缩放属性,但是这些解决方案需要使用专门的知识。在这项工作中,我们提出了NUMS,这是一个数组编程库,可在基于任务的分布式系统上优化类似Numpy的表达式。这是通过称为负载模拟层次调度(LSHS)的新型调度程序来实现的。 LSHS是一种本地搜索方法,可通过最大程度地减少分布式系统中任何给定节点上的最大内存和网络加载来优化运算符的位置。再加上用于负载平衡数据布局的启发式启发式,我们的方法能够在某些常见的数值操作上达到通信下限,我们的经验研究表明,通过使网络负载减少2倍,需要减少4倍的内存,并使在逻辑回归问题上的执行时间减少10倍,从而提高了Ray上的性能。在Terabyte尺度数据上,NUMS在DGEMM上实现了竞争性能,与Dask ML和Spark的Mllib相比,在键盘分解的密钥操作中,DASK高达20倍的速度以及logistic回归的2倍加速。

Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. However, many of these tools rely on dynamic schedulers optimized for abstract task graphs, which often encounter memory and network bandwidth-related bottlenecks due to sub-optimal data and operator placement decisions. Tools built on the message passing interface (MPI), such as ScaLAPACK and SLATE, have better scaling properties, but these solutions require specialized knowledge to use. In this work, we present NumS, an array programming library which optimizes NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS). LSHS is a local search method which optimizes operator placement by minimizing maximum memory and network load on any given node within a distributed system. Coupled with a heuristic for load balanced data layouts, our approach is capable of attaining communication lower bounds on some common numerical operations, and our empirical study shows that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem. On terabyte-scale data, NumS achieves competitive performance to SLATE on DGEMM, up to 20x speedup over Dask on a key operation for tensor factorization, and a 2x speedup on logistic regression compared to Dask ML and Spark's MLlib.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源