用于小型电池缓存的协作多机构多臂匪徒学习

论文标题

用于小型电池缓存的协作多机构多臂匪徒学习

Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching

论文作者

Xu, Xianzhe, Tao, Meixia, Shen, Cong

论文摘要

本文在用户偏好未知时研究了小型细胞网络（SCN）中基于学习的缓存。目标是优化每个小基站（SB）中的缓存放置，以最大程度地减少系统的长期传输延迟。我们将此顺序的多代理决策问题建模，以多武器的强盗（MAMAB）透视图。我们不是首先估算用户偏好，然后优化缓存策略，而是提出了几种基于MAMAB的算法，以直接在固定环境和非平稳环境中直接在线学习缓存策略。在固定环境中，我们首先提出了两种具有性能保证的基于高复杂代理的协作MAMAB算法。然后，我们提出了一个低复杂性分布的MAMAB，该Mamab忽略了SBS的配位。为了在SBS协调增益和计算复杂性之间取得更好的平衡，我们使用基于Edge的MAMAB开发了基于Edge Graph Edge Edge的奖励分配方法。在非平稳环境中，我们通过提出一种实际的初始化方法并设计新的扰动术语以适应动态环境，从而修改了基于MAMAB的算法。提供仿真结果以验证我们提出的算法的有效性。还讨论了不同参数对缓存性能的影响。

This paper investigates learning-based caching in small-cell networks (SCNs) when user preference is unknown. The goal is to optimize the cache placement in each small base station (SBS) for minimizing the system long-term transmission delay. We model this sequential multi-agent decision making problem in a multi-agent multi-armed bandit (MAMAB) perspective. Rather than estimating user preference first and then optimizing the cache strategy, we propose several MAMAB-based algorithms to directly learn the cache strategy online in both stationary and non-stationary environment. In the stationary environment, we first propose two high-complexity agent-based collaborative MAMAB algorithms with performance guarantee. Then we propose a low-complexity distributed MAMAB which ignores the SBS coordination. To achieve a better balance between SBS coordination gain and computational complexity, we develop an edge-based collaborative MAMAB with the coordination graph edge-based reward assignment method. In the non-stationary environment, we modify the MAMAB-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment. Simulation results are provided to validate the effectiveness of our proposed algorithms. The effects of different parameters on caching performance are also discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题