com-ddpg：一种基于多重加强学习的移动边缘卸载策略

论文标题

com-ddpg：一种基于多重加强学习的移动边缘卸载策略

Com-DDPG: A Multiagent Reinforcement Learning-based Offloading Strategy for Mobile Edge Computing

论文作者

Gao, Honghao, Wang, Xuejie, Ma, Xiaojin, Wei, Wei, Mumtaz, Shahid

论文摘要

移动服务的开发影响了各种计算密集型和时间敏感的应用程序，例如推荐系统和日常支付方法。但是，涉及有限资源的计算任务竞争增加了移动设备的任务处理延迟和能耗以及时间限制。移动边缘计算（MEC）已被广泛用于解决这些问题。但是，计算卸载过程中使用的现有方法存在局限性。一方面，他们专注于独立任务，而不是依赖任务。现实世界中任务依赖的挑战，尤其是任务细分和集成，仍有待解决。另一方面，必须考虑与资源分配和Mutex访问问题有关的多源场景。在本文中，我们提出了一种新颖的卸载方法COM-DDPG，用于MEC，使用多种强化学习来增强卸载性能。首先，我们讨论任务依赖模型，任务优先级模型，能源消耗模型和平均延迟，从服务器群集的角度以及移动任务上的多范围。我们基于这些模型的方法是为了使多个代理之间的沟通行为形式化；然后，将增强学习作为获得结果的卸载策略。由于状态不完整，长期记忆（LSTM）被用作评估内部状态的决策工具。此外，为了优化和支持有效的行动，我们考虑使用双向复发性神经网络（BRNN）来学习和增强从代理商的交流中获得的功能。最后，我们在阿里巴巴群集数据集上模拟实验。结果表明，就能耗，负载状态和潜伏期而言，我们的方法比其他基线更好。

The development of mobile services has impacted a variety of computation-intensive and time-sensitive applications, such as recommendation systems and daily payment methods. However, computing task competition involving limited resources increases the task processing latency and energy consumption of mobile devices, as well as time constraints. Mobile edge computing (MEC) has been widely used to address these problems. However, there are limitations to existing methods used during computation offloading. On the one hand, they focus on independent tasks rather than dependent tasks. The challenges of task dependency in the real world, especially task segmentation and integration, remain to be addressed. On the other hand, the multiuser scenarios related to resource allocation and the mutex access problem must be considered. In this paper, we propose a novel offloading approach, Com-DDPG, for MEC using multiagent reinforcement learning to enhance the offloading performance. First, we discuss the task dependency model, task priority model, energy consumption model, and average latency from the perspective of server clusters and multidependence on mobile tasks. Our method based on these models is introduced to formalize communication behavior among multiple agents; then, reinforcement learning is executed as an offloading strategy to obtain the results. Because of the incomplete state information, long short-term memory (LSTM) is employed as a decision-making tool to assess the internal state. Moreover, to optimize and support effective action, we consider using a bidirectional recurrent neural network (BRNN) to learn and enhance features obtained from agents' communication. Finally, we simulate experiments on the Alibaba cluster dataset. The results show that our method is better than other baselines in terms of energy consumption, load status and latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题