在太空空间集成网络中，用于延迟面向的物联网任务计划的深度加固学习

论文标题

在太空空间集成网络中，用于延迟面向的物联网任务计划的深度加固学习

Deep Reinforcement Learning for Delay-Oriented IoT Task Scheduling in Space-Air-Ground Integrated Network

论文作者

Zhou, Conghao, Wu, Wen, He, Hongli, Yang, Peng, Lyu, Feng, Cheng, Nan, Xuemin, Shen

论文摘要

在本文中，我们研究了空气空气地面集成网络（Sagin）中的计算任务调度问题，以用于延迟面向物联网（IoT）服务。在考虑的情况下，无人驾驶飞机（UAV）从物联网设备收集计算任务，然后做出在线卸载决策，其中可以在无人机上处理任务或将其卸载到附近的基站或远程卫星。我们的目标是设计一个任务调度策略，以最大程度地减少给定UAV能源容量限制的所有任务的卸载和计算延迟。为此，我们首先将在线调度问题作为能源约束的马尔可夫决策过程（MDP）。然后，考虑到任务到达动态，我们开发了一种新型的深层风险敏感增强算法。具体而言，该算法评估了风险，该风险衡量了超过约束的能源消耗，并搜索最佳参数，同时在学习最佳策略的同时权衡延迟和风险的最小化。广泛的仿真结果表明，与概率配置方法相比，所提出的算法可以将任务处理延迟延迟高达30％，同时满足无人机能量容量约束。

In this paper, we investigate a computing task scheduling problem in space-air-ground integrated network (SAGIN) for delay-oriented Internet of Things (IoT) services. In the considered scenario, an unmanned aerial vehicle (UAV) collects computing tasks from IoT devices and then makes online offloading decisions, in which the tasks can be processed at the UAV or offloaded to the nearby base station or the remote satellite. Our objective is to design a task scheduling policy that minimizes offloading and computing delay of all tasks given the UAV energy capacity constraint. To this end, we first formulate the online scheduling problem as an energy-constrained Markov decision process (MDP). Then, considering the task arrival dynamics, we develop a novel deep risk-sensitive reinforcement learning algorithm. Specifically, the algorithm evaluates the risk, which measures the energy consumption that exceeds the constraint, for each state and searches the optimal parameter weighing the minimization of delay and risk while learning the optimal policy. Extensive simulation results demonstrate that the proposed algorithm can reduce the task processing delay by up to 30% compared to probabilistic configuration methods while satisfying the UAV energy capacity constraint.

下载PDF全文

下载文献需遵守相关版权规定

论文标题