AI工作负载的模型驱动的集群资源管理在边缘云中

论文标题

AI工作负载的模型驱动的集群资源管理在边缘云中

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

论文作者

Liang, Qianlin, Hanafy, Walid A., Ali-Eldin, Ahmed, Shenoy, Prashant

论文摘要

由于新兴的边缘应用程序（例如物联网（IoT）分析和增强现实的现实都具有严格的延迟限制，因此最近提出了硬件AI加速器来加快这些应用程序运行的深度神经网络（DNN）推论。资源受限的边缘服务器和加速器倾向于在多个物联网应用程序中多路复用，从而引入了潜在的潜伏期敏感工作负载之间的性能干扰。在本文中，我们设计了分析模型，以在不同的多重和并发行为下捕获共享边缘加速器（例如GPU和EDGETPU）上的DNN推理工作负载的性能。在使用广泛的实验验证了我们的模型之后，我们使用它们来设计各种集群资源管理算法，以在边缘加速器上智能管理多个应用程序，同时尊重其延迟约束。我们在Kubernetes中实现了系统的原型，并表明我们的系统可以在异质的多租户边缘簇中托管2.3倍的DNN应用程序，而与传统的knapsack托管算法相比，无延迟违规。

Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this paper, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3X more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题