映射器：在混合动态环境中使用进化增强学习的多代理路径计划

论文标题

映射器：在混合动态环境中使用进化增强学习的多代理路径计划

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

论文作者

Liu, Zuxin, Chen, Baiming, Zhou, Hongyi, Koushik, Guru, Hebert, Martial, Zhao, Ding

论文摘要

当将大规模机器人机器人部署到现实世界应用程序中时，动态环境中的多代理导航具有巨大的工业价值。本文提出了一种通过进化增强学习（Mapper）方法（MAPPER）方法的部分分散的多代理路径计划，以在混合动态环境中学习有效的本地规划政策。基于增强学习的方法通常会在具有目标条件的稀疏奖励的长途任务上遭受性能降解，因此，在全球规划师的指导下，我们将远程导航任务分解为许多更轻松的子任务，从而提高了代理商在大环境中的性能。此外，大多数现有的多代理计划方法都采用周围环境的完美信息或附近动态代理的均匀性，这可能在实践中可能不存在。我们的方法使用基于图像的表示，模拟动态障碍的行为，并在混合动态环境中训练政策，而无需假设。为了确保多机构训练稳定性和性能，我们提出了一种进化训练方法，可以轻松地缩放到大型且复杂的环境中。实验表明，与传统的基于反应的计划者LRA*和基于最先进的学习方法相比，映射器能够实现更高的成功率和更稳定的性能。

Multi-agent navigation in dynamic environments is of great industrial value when deploying a large scale fleet of robot to real-world applications. This paper proposes a decentralized partially observable multi-agent path planning with evolutionary reinforcement learning (MAPPER) method to learn an effective local planning policy in mixed dynamic environments. Reinforcement learning-based methods usually suffer performance degradation on long-horizon tasks with goal-conditioned sparse rewards, so we decompose the long-range navigation task into many easier sub-tasks under the guidance of a global planner, which increases agents' performance in large environments. Moreover, most existing multi-agent planning approaches assume either perfect information of the surrounding environment or homogeneity of nearby dynamic agents, which may not hold in practice. Our approach models dynamic obstacles' behavior with an image-based representation and trains a policy in mixed dynamic environments without homogeneity assumption. To ensure multi-agent training stability and performance, we propose an evolutionary training approach that can be easily scaled to large and complex environments. Experiments show that MAPPER is able to achieve higher success rates and more stable performance when exposed to a large number of non-cooperative dynamic obstacles compared with traditional reaction-based planner LRA* and the state-of-the-art learning-based method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题