在共享集群中为长期运行的应用程序提供亲和力感知的资源

论文标题

在共享集群中为长期运行的应用程序提供亲和力感知的资源

Affinity-Aware Resource Provisioning for Long-Running Applications in Shared Clusters

论文作者

Mommessin, Clement, Yang, Renyu, Shakhlevich, Natalia V., Sun, Xiaoyang, Kumar, Satish, Xiao, Junqing, Xu, Jie

论文摘要

资源供应在确定适量的基础架构资源来运行应用程序并针对全球脱碳目标方面起着关键作用。现在，重要的一部分生产集群专门用于长期运行的应用程序（LRAS），该应用程序通常以微服务的形式进行，并以小时甚至数月的顺序执行。因此，在计划将LRA的放置放置在共享集群中实际上很重要，以便可以将其所需的计算节点数量最小化以减少碳足迹并降低运营成本。 LRA调度上的现有作品通常是应用程序不可静止的，而没有特别解决LRAS施加的约束要求，例如共同位置亲和力约束和时变资源要求。在本文中，我们提出了一种亲和力感知的资源配置方法，用于在共享集群中部署大规模LRA，受到多个约束，目的是最大程度地减少使用的计算节点的数量。我们研究了广泛的解决方案算法，这些算法属于三个主要类别：以应用程序为中心，以节点为中心和多节点方法，并将它们调整为典型的大型现实世界情景。由阿里巴巴天奇数据集驱动的实验研究表明，与最新作品所使用的启发式方法相比，我们的算法可以达到竞争性的调度效率和运行时间，包括Medea和Lrasched。

Resource provisioning plays a pivotal role in determining the right amount of infrastructure resource to run applications and target the global decarbonization goal. A significant portion of production clusters is now dedicated to long-running applications (LRAs), which are typically in the form of microservices and executed in the order of hours or even months. It is therefore practically important to plan ahead the placement of LRAs in a shared cluster so that the number of compute nodes required by them can be minimized to reduce carbon footprint and lower operational costs. Existing works on LRA scheduling are often application-agnostic, without particularly addressing the constraining requirements imposed by LRAs, such as co-location affinity constraints and time-varying resource requirements. In this paper, we present an affinity-aware resource provisioning approach for deploying large-scale LRAs in a shared cluster subject to multiple constraints, with the objective of minimizing the number of compute nodes in use. We investigate a broad range of solution algorithms which fall into three main categories: Application-Centric, Node-Centric, and Multi-Node approaches, and tune them for typical large-scale real-world scenarios. Experimental studies driven by the Alibaba Tianchi dataset show that our algorithms can achieve competitive scheduling effectiveness and running time, as compared with the heuristics used by the latest work including Medea and LraSched.

下载PDF全文

下载文献需遵守相关版权规定

论文标题