通过在线加强学习中的多租户边缘计算中的缓存分配

论文标题

通过在线加强学习中的多租户边缘计算中的缓存分配

Cache Allocation in Multi-Tenant Edge Computing via online Reinforcement Learning

论文作者

Ben-Ameur, Ayoub, Araldo, Andrea, Chahed, Tijani

论文摘要

我们在多租户环境中考虑了这项工作边缘计算（EC）：资源所有者，即网络运营商（NO），虚拟资源使资源虚拟化，并允许第三方服务提供商（SPS-租户）运行其服务，这些服务可能是多元化的，并且有异构要求。由于保证了保证，NO无法观察加密的SPS的性质。这使得资源分配决策具有挑战性，因为它们必须仅基于观察到的监视信息进行。我们专注于一个特定资源，即缓存空间，该资源部署在某个边缘节点中，例如基站。我们研究了NO的决定有关如何在几个SP中分区缓存，以最大程度地减少上游流量。我们的目标是使用纯粹的数据驱动的，无模型的增强学习（RL）优化缓存分配。与RL的大多数应用程序不同的是，在模拟器上脱机学习决策策略，我们假设没有以前的知识可用于构建这种模拟器。因此，我们以\ emph {在线}方式应用RL，即通过直接扰动实际系统并监视其性能的变化来学习策略。由于扰动会产生虚假流量，因此我们也限制了它们。我们在模拟中表明，我们的方法迅速收敛到理论上的最佳，我们研究了它的公平性，其对几种情况特征的敏感性，并将其与最先进的方法进行比较。

We consider in this work Edge Computing (EC) in a multi-tenant environment: the resource owner, i.e., the Network Operator (NO), virtualizes the resources and lets third party Service Providers (SPs - tenants) run their services, which can be diverse and with heterogeneous requirements. Due to confidentiality guarantees, the NO cannot observe the nature of the traffic of SPs, which is encrypted. This makes resource allocation decisions challenging, since they must be taken based solely on observed monitoring information. We focus on one specific resource, i.e., cache space, deployed in some edge node, e.g., a base station. We study the decision of the NO about how to partition cache among several SPs in order to minimize the upstream traffic. Our goal is to optimize cache allocation using purely data-driven, model-free Reinforcement Learning (RL). Differently from most applications of RL, in which the decision policy is learned offline on a simulator, we assume no previous knowledge is available to build such a simulator. We thus apply RL in an \emph{online} fashion, i.e., the policy is learned by directly perturbing the actual system and monitoring how its performance changes. Since perturbations generate spurious traffic, we also limit them. We show in simulation that our method rapidly converges toward the theoretical optimum, we study its fairness, its sensitivity to several scenario characteristics and compare it with a method from the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题