论文标题
深入加强学习基于时变无线下行干扰渠道的动态功率和横梁成形设计
Deep Reinforcement Learning Based Dynamic Power and Beamforming Design for Time-Varying Wireless Downlink Interference Channel
论文作者
论文摘要
随着无线通信技术的高度开发,它被广泛用于各个领域,以方便而有效的数据传输。与时间不变的无线通道的常用假设不同,我们将重点放在有关时间变化的无线下行链路通道的研究上,以接近实际情况。我们的目标是在有关截止信号与干扰和噪声比(SINR),传输功率和波束形成的某些约束下,在时间变化通道中获得总和率的最大值。为了适应快速变化的渠道,我们放弃了常用的算法凸优化,并且在本文中使用了深层增强学习算法。从普通措施(例如功率控制,干扰不协调和波束形成)的视图中,应考虑措施的连续变化,而由于情节流产而引起的稀疏奖励问题,不应忽略重要的瓶颈。因此,通过对相关算法的分析,我们在我们的工作中提出了两种算法,即深层确定性的策略梯度算法(DDPG)和层次DDPG。至于这两种算法,为了解决离散输出,DDPG是通过将参与者 - 批判性算法与深Q学习(DQN)相结合(DQN)来确定的,以便它可以输出连续的动作而无需牺牲DQN带来的现有优势并可以提高性能。此外,为了应对稀疏奖励的挑战,我们利用元素策略从层次理论的概念中,将DDPG中的一个代理分为一个元控制器,将一个控制器分为一个层次结构DDPG。我们的仿真结果表明,所提出的DDPG和分层DDPG从覆盖范围,收敛性和总和率性能的看法中表现出色。
With the high development of wireless communication techniques, it is widely used in various fields for convenient and efficient data transmission. Different from commonly used assumption of the time-invariant wireless channel, we focus on the research on the time-varying wireless downlink channel to get close to the practical situation. Our objective is to gain the maximum value of sum rate in the time-varying channel under the some constraints about cut-off signal-to-interference and noise ratio (SINR), transmitted power and beamforming. In order to adapt the rapid changing channel, we abandon the frequently used algorithm convex optimization and deep reinforcement learning algorithms are used in this paper. From the view of the ordinary measures such as power control, interference incoordination and beamforming, continuous changes of measures should be put into consideration while sparse reward problem due to the abortion of episodes as an important bottleneck should not be ignored. Therefore, with the analysis of relevant algorithms, we proposed two algorithms, Deep Deterministic Policy Gradient algorithm (DDPG) and hierarchical DDPG, in our work. As for these two algorithms, in order to solve the discrete output, DDPG is established by combining the Actor-Critic algorithm with Deep Q-learning (DQN), so that it can output the continuous actions without sacrificing the existed advantages brought by DQN and also can improve the performance. Also, to address the challenge of sparse reward, we take advantage of meta policy from the idea of hierarchical theory to divide one agent in DDPG into one meta-controller and one controller as hierarchical DDPG. Our simulation results demonstrate that the proposed DDPG and hierarchical DDPG performs well from the views of coverage, convergence and sum rate performance.