MMW-NOMA中的深度加固学习：关节电源分配和混合边界

论文标题

MMW-NOMA中的深度加固学习：关节电源分配和混合边界

Deep Reinforcement Learning in mmW-NOMA: Joint Power Allocation and Hybrid Beamforming

论文作者

Akbarpour-Kasgari, Abbas, Ardebilipour, Mehrdad

论文摘要

下一代无线通信中数据速率的高需求可以通过毫米波（MMW）频段中的非正交多访问（NOMA）方法来确保。必须减少对其他用户的干扰，同时通过关节功率分配和波束成形维持比特率，以确保比特率的高需求。此外，MMW频带同时决定了实施和性能方面的权衡。在本文中，通过机器学习和控制理论方法的最新进展称为“深钢筋学习”（DRL），提出了MMW-Noma系统的关节功率分配和混合波束形成。利用参与者批评现象来衡量即时的奖励，并提供新的行动，以最大程度地提高网络的整体Q值。此外，为了提高该方法的稳定性，我们还利用了软角色批判性（SAC）方法，在这些方法中，总体奖励和动作熵同时也最大化。立即奖励是根据所有用户速率的软加权总和来定义的。软加权基于每个用户的速率和分配功率。此外，用户和基站（BS）之间的频道响应定义为环境状态，而动作空间涉及数字和模拟波束的权重，并向每个用户分配的功率。仿真结果代表了所提出的方法的优越性，而不是按用户的总和率表示的时间分段多重访问（TDMA）和非视线（NLOS） - 非瘤。它的表现超出了拟议方法对渠道响应的联合优化和独立性引起的。

High demand of data rate in the next generation of wireless communication could be ensured by Non-Orthogonal Multiple Access (NOMA) approach in the millimetre-wave (mmW) frequency band. Decreasing the interference on the other users while maintaining the bit rate via joint power allocation and beamforming is mandatory to guarantee the high demand of bit-rate. Furthermore, mmW frequency bands dictates the hybrid structure for beamforming because of the trade-off in implementation and performance, simultaneously. In this paper, joint power allocation and hybrid beamforming of mmW-NOMA systems is brought up via recent advances in machine learning and control theory approaches called Deep Reinforcement Learning (DRL). Actor-critic phenomena is exploited to measure the immediate reward and providing the new action to maximize the overall Q-value of the network. Additionally, to improve the stability of the approach, we have utilized Soft Actor-Critic (SAC) approach where overall reward and action entropy is maximized, simultaneously. The immediate reward has been defined based on the soft weighted summation of the rate of all the users. The soft weighting is based on the achieved rate and allocated power of each user. Furthermore, the channel responses between the users and base station (BS) is defined as the state of environment, while action space is involved of the digital and analog beamforming weights and allocated power to each user. The simulation results represent the superiority of the proposed approach rather than the Time-Division Multiple Access (TDMA) and Non-Line of Sight (NLOS)-NOMA in terms of sum-rate of the users. It's outperformance is caused by the joint optimization and independency of the proposed approach to the channel responses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题