使用多代理增强学习的爆发下行链路传输的面向公平的用户安排

论文标题

使用多代理增强学习的爆发下行链路传输的面向公平的用户安排

Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning

论文作者

Yuan, Mingqi, Cao, Qi, Pun, Man-on, Chen, Yi

论文摘要

在这项工作中，我们开发了实用的用户调度算法，用于下行链路爆发流量，重点是用户公平。与传统的调度算法相反，该算法将传输时间插槽平均分配在用户之间，或者在没有物理含义的情况下最大化某些比率，我们建议将5％tile用户数据速率（5TUDR）作为度量标准来评估用户公平性。由于很难直接优化5TUDR，因此我们首先将问题投入到随机游戏框架中，并随后提出了基于多代理的增强学习（MARL）的算法，以对资源块组（RBG）分配执行分布式优化。此外，每个MARL代理都旨在获取来自多个网络层的网络计数器（例如通道质量指示器，缓冲区大小）作为输入状态的信息，而RBG分配为具有旨在最大化5TUDR的拟议奖励功能的动作。进行大量的仿真表明，与传统调度程序相比，提议的基于MARL的调度程序可以实现公平的计划，同时保持良好的平均网络吞吐量。

In this work, we develop practical user scheduling algorithms for downlink bursty traffic with emphasis on user fairness. In contrast to the conventional scheduling algorithms that either equally divides the transmission time slots among users or maximizing some ratios without physcial meanings, we propose to use the 5%-tile user data rate (5TUDR) as the metric to evaluate user fairness. Since it is difficult to directly optimize 5TUDR, we first cast the problem into the stochastic game framework and subsequently propose a Multi-Agent Reinforcement Learning (MARL)-based algorithm to perform distributed optimization on the resource block group (RBG) allocation. Furthermore, each MARL agent is designed to take information measured by network counters from multiple network layers (e.g. Channel Quality Indicator, Buffer size) as the input states while the RBG allocation as action with a proposed reward function designed to maximize 5TUDR. Extensive simulation is performed to show that the proposed MARL-based scheduler can achieve fair scheduling while maintaining good average network throughput as compared to conventional schedulers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题