论文标题
网络级别流量信号控制的分散深度加强学习
Decentralized Deep Reinforcement Learning for Network Level Traffic Signal Control
论文作者
论文摘要
在这篇论文中,我提出了一个完全分散的深层多代理增强学习(MARL)算法的家庭,以在网络级交通信号控制中实现高实时性能。在这种方法中,每个交叉路口都被建模为一种代理,该代理在以无向图为模型的流量信号网络中对其他相交节点进行扮演的效果,以接近延迟的最佳减少。在可观察到的马尔可夫决策过程(POMDP)之后,相邻学习代理之间存在3个级别的通信方案:独立的深Q倾斜(IDQL),共享状态增强学习(S2RL)和S2RL-S2R2L的共享状态和奖励版本。 In these 3 variants of decentralized MARL schemes, individual agent trains its local deep Q network (DQN) separately, enhanced by convergence-guaranteed techniques like double DQN, prioritized experience replay, multi-step bootstrapping, etc. To test the performance of the proposed three MARL algorithms, a SUMO-based simulation platform is developed to mimic the traffic evolution of the real world. FED在允许的OD对之间随机交通需求,在测试台时建立了一个4x4曼哈顿风格的网格网络,为模型训练和测试生成了两个不同的车辆到达率。实验结果表明,在训练过程中,S2R2L比IDQL和S2RL具有更快的收敛速率和更好的收敛性能。此外,三个MARL计划都揭示了出色的概括能力。在平均车辆延迟,网络级队列长度和燃油消耗率的标准下,他们的测试结果超过了基准最大压力(MP)算法。值得注意的是,与MP相比,S2R2L具有减少34.55%的交通延迟和耗散10.91%的队列长度的最佳测试性能。
In this thesis, I propose a family of fully decentralized deep multi-agent reinforcement learning (MARL) algorithms to achieve high, real-time performance in network-level traffic signal control. In this approach, each intersection is modeled as an agent that plays a Markovian Game against the other intersection nodes in a traffic signal network modeled as an undirected graph, to approach the optimal reduction in delay. Following Partially Observable Markov Decision Processes (POMDPs), there are 3 levels of communication schemes between adjacent learning agents: independent deep Q-leaning (IDQL), shared states reinforcement learning (S2RL) and a shared states & rewards version of S2RL--S2R2L. In these 3 variants of decentralized MARL schemes, individual agent trains its local deep Q network (DQN) separately, enhanced by convergence-guaranteed techniques like double DQN, prioritized experience replay, multi-step bootstrapping, etc. To test the performance of the proposed three MARL algorithms, a SUMO-based simulation platform is developed to mimic the traffic evolution of the real world. Fed with random traffic demand between permitted OD pairs, a 4x4 Manhattan-style grid network is set up as the testbed, two different vehicle arrival rates are generated for model training and testing. The experiment results show that S2R2L has a quicker convergence rate and better convergent performance than IDQL and S2RL in the training process. Moreover, three MARL schemes all reveal exceptional generalization abilities. Their testing results surpass the benchmark Max Pressure (MP) algorithm, under the criteria of average vehicle delay, network-level queue length and fuel consumption rate. Notably, S2R2L has the best testing performance of reducing 34.55% traffic delay and dissipating 10.91% queue length compared with MP.