流量预测和随机访问控制优化：学习和基于学习的方法

论文标题

流量预测和随机访问控制优化：学习和基于学习的方法

Traffic Prediction and Random Access Control Optimization: Learning and Non-learning based Approaches

论文作者

Jiang, Nan, Deng, Yansha, Nallanathan, Arumugam

论文摘要

现代无线通信中的随机访问方案通常基于框架 - 阿洛哈（F-Aloha），可以通过灵活地组织设备的传输和重新传输来优化。但是，由于缺乏有关复杂的交通生成统计数据和随机碰撞发生的信息，因此这种优化通常是棘手的。在本文中，我们首先总结了针对不同随机访问方案的访问控制优化的一般结构，然后根据机器学习（ML）和非ML技术来查看现有的访问控制优化。我们证明，基于ML的方法与非基于非ML的方法相比，基于ML的方法可以更好地优化访问控制问题，因为它们在解决了高复杂性长期优化问题和从现实中学习体验知识的能力。为了进一步提高随机访问性能，我们建议两步学习优化器用于访问控制优化，该优化单独执行流量预测和访问控制配置。详细介绍，我们的流量预测方法依赖于在线监督学习采用复发性神经网络（RNN），该学习可以准确地捕获连续框架上的流量统计信息，并且访问控制配置可以使用基于非ML的控制器或基于合作的深入强化学习（DRL）基于基于不同随机访问的复杂性的基于不同的深入强化学习（DRL）。数值结果表明，提出的两步合作学习优化器在较高的训练效率和更好的访问性能方面大大优于常规深层Q-Network（DQN）。

Random access schemes in modern wireless communications are generally based on the framed-ALOHA (f-ALOHA), which can be optimized by flexibly organizing devices' transmission and re-transmission. However, this optimization is generally intractable due to the lack of information about complex traffic generation statistics and the occurrence of the random collision. In this article, we first summarize the general structure of access control optimization for different random access schemes, and then review the existing access control optimization based on Machine Learning (ML) and non-ML techniques. We demonstrate that the ML-based methods can better optimize the access control problem compared with non-ML based methods, due to their capability in solving high complexity long-term optimization problem and learning experiential knowledge from reality. To further improve the random access performance, we propose two-step learning optimizers for access control optimization, which individually execute the traffic prediction and the access control configuration. In detail, our traffic prediction method relies on online supervised learning adopting Recurrent Neural Networks (RNNs) that can accurately capture traffic statistics over consecutive frames, and the access control configuration can use either a non-ML based controller or a cooperatively trained Deep Reinforcement Learning (DRL) based controller depending on the complexity of different random access schemes. Numerical results show that the proposed two-step cooperative learning optimizer considerably outperforms the conventional Deep Q-Network (DQN) in terms of higher training efficiency and better access performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题