NOMA增强室内智能机器人的路径设计和资源管理

论文标题

NOMA增强室内智能机器人的路径设计和资源管理

Path Design and Resource Management for NOMA enhanced Indoor Intelligent Robots

论文作者

Zhong, Ruikang, Liu, Xiao, Liu, Yuanwei, Chen, Yue, Wang, Xianbin

论文摘要

提出了启用通信的室内智能机器人（IRS）服务框架，其中采用了非正交多访问（NOMA）技术来实现高度可靠的通信。与国际电信联盟（ITU）最近提出的超现代室内渠道模型合作，提出了乐高建模方法，该方法可以确定性地描述室内布局和通道状态，以构建无线电图。调查的无线电图被称为训练增强学习代理的虚拟环境，可以节省培训时间和硬件成本。建立在拟议的沟通模型的基础上，即需要达到指定的任务目的地的IRS的动议及其相应的下链路分配政策，共同优化了以最大化IRS的任务效率和通信可靠性。为了解决这个优化问题，提出了一种新颖的增强学习方法，名为“深层转移”策略梯度（DT-DPG）算法。我们的仿真结果表明，1）借助NOMA技术，IRS的通信可靠性有效提高； 2）无线电图有资格成为虚拟培训环境，其统计渠道状态信息提高了培训效率约30％； 3）所提出的DT-DPG算法优于传统的深层确定性策略梯度（DDPG）算法，就优化性能，训练时间和反本地最佳能力而言。

A communication enabled indoor intelligent robots (IRs) service framework is proposed, where non-orthogonal multiple access (NOMA) technique is adopted to enable highly reliable communications. In cooperation with the ultramodern indoor channel model recently proposed by the International Telecommunication Union (ITU), the Lego modeling method is proposed, which can deterministically describe the indoor layout and channel state in order to construct the radio map. The investigated radio map is invoked as a virtual environment to train the reinforcement learning agent, which can save training time and hardware costs. Build on the proposed communication model, motions of IRs who need to reach designated mission destinations and their corresponding down-link power allocation policy are jointly optimized to maximize the mission efficiency and communication reliability of IRs. In an effort to solve this optimization problem, a novel reinforcement learning approach named deep transfer deterministic policy gradient (DT-DPG) algorithm is proposed. Our simulation results demonstrate that 1) With the aid of NOMA techniques, the communication reliability of IRs is effectively improved; 2) The radio map is qualified to be a virtual training environment, and its statistical channel state information improves training efficiency by about 30%; 3) The proposed DT-DPG algorithm is superior to the conventional deep deterministic policy gradient (DDPG) algorithm in terms of optimization performance, training time, and anti-local optimum ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题