对GPU的多租户深度学习推论的调查

论文标题

对GPU的多租户深度学习推论的调查

A Survey of Multi-Tenant Deep Learning Inference on GPU

论文作者

Yu, Fuxun, Wang, Di, Shangguan, Longfei, Zhang, Minjia, Liu, Chenchen, Chen, Xiang

论文摘要

深度学习（DL）模型已取得了卓越的性能。同时，计算诸如NVIDIA GPU之类的硬件还显示了强劲的计算缩放趋势，每一代都有2倍的吞吐量和内存带宽。随着GPU的如此强大的计算缩放，通过将多个DL模型共同将多个DL模型共同进行了多租户深度学习推断，可以广泛部署以改善资源利用，增强吞吐量，降低能源成本等。但是，实现有效的多倾角DL推断，需要挑战性的全面堆栈系统挑战。这项调查旨在总结和分类多租户DL推断GPU的新兴挑战和优化机会。通过概述整个优化堆栈，总结多租户计算创新并阐述最近的技术进步，我们希望这项调查能够阐明新的优化观点并激发未来大型DL系统优化的新颖作品。

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving throughput, reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technological advances, we hope that this survey could shed light on new optimization perspectives and motivate novel works in future large-scale DL system optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题