使用约束受控的RL在数据中心中的作业调度

论文标题

使用约束受控的RL在数据中心中的作业调度

Job Scheduling in Datacenters using Constraint Controlled RL

论文作者

Venkataswamy, Vanamala

论文摘要

本文研究了绿色数据中心在线工作计划的模型。在绿色数据中心中，资源可用性取决于可再生能源的电源。间歇性的电源从可再生能源导致间歇性资源可用性，引起了工作延迟（及相关费用）。绿色数据中心运营商必须智能管理其工作量和可用的电源，以提取最大收益。调度程序的目标是安排一组资源上的作业，以最大化总价值（收入），同时最大程度地减少整个工作延迟。一方面实现高工作价值与另一方面的低预期延误之间存在权衡。因此，取得高回报和低成本的目的是反对。此外，数据中心运营商通常会优先考虑多个目标，包括高系统利用和完成工作。为了实现最大化总工作价值和最小化工作延迟的相反目标，我们将深度强化学习中的比例综合衍生物（PID）Lagrangian方法应用于绿色数据中心环境中的工作调整问题。拉格朗日方法被广泛用于约束优化问题的算法。我们采用控制观点来学习具有比例，积分和衍生性控制的Lagrange乘数，从而实现了有利的学习动力。反馈控制定义了学习代理的成本术语，监视培训期间的成本限制，并不断调整学习参数以实现稳定的性能。我们的实验表明，与没有PID Lagrangian方法的调度策略相比，性能的提高。实验结果说明了同时满足多个目标的约束控制加固学习（Cocorl）调度程序的有效性。

This paper studies a model for online job scheduling in green datacenters. In green datacenters, resource availability depends on the power supply from the renewables. Intermittent power supply from renewables leads to intermittent resource availability, inducing job delays (and associated costs). Green datacenter operators must intelligently manage their workloads and available power supply to extract maximum benefits. The scheduler's objective is to schedule jobs on a set of resources to maximize the total value (revenue) while minimizing the overall job delay. A trade-off exists between achieving high job value on the one hand and low expected delays on the other. Hence, the aims of achieving high rewards and low costs are in opposition. In addition, datacenter operators often prioritize multiple objectives, including high system utilization and job completion. To accomplish the opposing goals of maximizing total job value and minimizing job delays, we apply the Proportional-Integral-Derivative (PID) Lagrangian methods in Deep Reinforcement Learning to job scheduling problem in the green datacenter environment. Lagrangian methods are widely used algorithms for constrained optimization problems. We adopt a controls perspective to learn the Lagrange multiplier with proportional, integral, and derivative control, achieving favorable learning dynamics. Feedback control defines cost terms for the learning agent, monitors the cost limits during training, and continuously adjusts the learning parameters to achieve stable performance. Our experiments demonstrate improved performance compared to scheduling policies without the PID Lagrangian methods. Experimental results illustrate the effectiveness of the Constraint Controlled Reinforcement Learning (CoCoRL) scheduler that simultaneously satisfies multiple objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题