NOMA辅助无人机网络中的多代理增强学习用于蜂窝卸载

论文标题

NOMA辅助无人机网络中的多代理增强学习用于蜂窝卸载

Multi-Agent Reinforcement Learning in NOMA-aided UAV Networks for Cellular Offloading

论文作者

Zhong, Ruikang, Liu, Xiao, Liu, Yuanwei, Chen, Yue

论文摘要

提出了一个新颖的框架，用于借助多个无人机（UAV），用于细胞卸载，而在每种无人机上采用了非正交多访问（NOMA）技术，以进一步提高无线网络的光谱效率。制定了连接三维（3D）轨迹设计和功率分配的优化问题，以最大程度地提高吞吐量。由于地面移动用户被认为是连续漫游的，因此需要根据用户的移动来及时重新部署无人机。为了解决这个相关的动态问题，首先采用了基于K-均基于K均值的聚类算法来定期分区用户。之后，提出了相互深度Q-NETWORK（MDQN）算法共同确定无人机的最佳3D轨迹和功率分配。与传统的DQN算法相反，MDQN算法使多机构的经验可以输入共享的神经网络中，以借助国家抽象来缩短培训时间。数值结果表明：1）所提出的MDQN算法能够在较小的约束下收敛，并且比在多机构情况下的常规DQN算法具有更快的收敛速率； 2）NOMA增强无人机网络的可实现的总和速率比正交多访问（OMA）高出23％； 3）通过借助MDON算法设计无人机的最佳3D轨迹，网络的总和率分别比调用圆形轨迹和2D轨迹的速度分别具有142％和56％的增长。

A novel framework is proposed for cellular offloading with the aid of multiple unmanned aerial vehicles (UAVs), while the non-orthogonal multiple access (NOMA) technique is employed at each UAV to further improve the spectrum efficiency of the wireless network. The optimization problem of joint three-dimensional (3D) trajectory design and power allocation is formulated for maximizing the throughput. Since ground mobile users are considered as roaming continuously, the UAVs need to be re-deployed timely based on the movement of users. In an effort to solve this pertinent dynamic problem, a K-means based clustering algorithm is first adopted for periodically partitioning users. Afterward, a mutual deep Q-network (MDQN) algorithm is proposed to jointly determine the optimal 3D trajectory and power allocation of UAVs. In contrast to the conventional DQN algorithm, the MDQN algorithm enables the experience of multi-agent to be input into a shared neural network to shorten the training time with the assistance of state abstraction. Numerical results demonstrate that: 1) the proposed MDQN algorithm is capable of converging under minor constraints and has a faster convergence rate than the conventional DQN algorithm in the multi-agent case; 2) The achievable sum rate of the NOMA enhanced UAV network is 23% superior to the case of orthogonal multiple access (OMA); 3) By designing the optimal 3D trajectory of UAVs with the aid of the MDON algorithm, the sum rate of the network enjoys 142% and 56% gains than that of invoking the circular trajectory and the 2D trajectory, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题