论文标题

最好的多代理增强学习

Off-Beat Multi-Agent Reinforcement Learning

论文作者

Qiu, Wei, Wang, Weixun, Wang, Rundong, An, Bo, Hu, Yujing, Obraztsova, Svetlana, Rabinovich, Zinovi, Hao, Jianye, Chen, Yingfeng, Fan, Changjie

论文摘要

我们研究了普遍存在的动作,即所有动作都具有预设执行持续时间的环境中,我们研究了无模型的多代理增强学习(MARL)。在执行持续时间内,环境变化受到动作执行的影响,但不同步。在许多现实世界中,这种设置无处不在。但是,大多数MAL方法都假定推断后立即执行动作,这通常是不现实的,并且可能导致多机构协调与伴随动作的灾难性失败。为了填补这一空白,我们为MARL开发了一个算法的算法框架。然后,我们为无模型的MARL算法提出了一种新颖的情节记忆,Legem。 Legem通过利用代理人的个人经历来建立代理商的情节记忆。通过我们的新型奖励再分配计划提出了挑战性的时间信用分配问题,从而减轻了非马克维亚奖励问题,从而促进了多机构学习。我们在各种多代理场景上评估了Legem,其中包括猎犬游戏,采石场游戏,造林游戏和Starcraft II微型管理任务。经验结果表明,LegeM显着提高了多代理的协调,并提高了领先的绩效并提高了样本效率。

We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent, i.e., all actions have pre-set execution durations. During execution durations, the environment changes are influenced by, but not synchronised with, action execution. Such a setting is ubiquitous in many real-world problems. However, most MARL methods assume actions are executed immediately after inference, which is often unrealistic and can lead to catastrophic failure for multi-agent coordination with off-beat actions. In order to fill this gap, we develop an algorithmic framework for MARL with off-beat actions. We then propose a novel episodic memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents' episodic memories by utilizing agents' individual experiences. It boosts multi-agent learning by addressing the challenging temporal credit assignment problem raised by the off-beat actions via our novel reward redistribution scheme, alleviating the issue of non-Markovian reward. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks. Empirical results show that LeGEM significantly boosts multi-agent coordination and achieves leading performance and improved sample efficiency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源