意外崩溃的代理商教练辅助的多代理增强学习框架

论文标题

意外崩溃的代理商教练辅助的多代理增强学习框架

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

论文作者

Zhao, Jian, Zhao, Youpeng, Wang, Weixun, Yang, Mingyu, Hu, Xunhan, Zhou, Wengang, Hao, Jianye, Li, Houqiang

论文摘要

多代理强化学习很难在实践中应用，这部分是由于模拟和现实世界中的差距。差距的原因之一是，模拟系统始终假设代理可以一直正常工作，而在实践中，由于不可避免的硬件或软件故障，在协调过程中，一个或多个代理可能会意外地“崩溃”。这种崩溃将破坏代理商之间的合作，从而导致性能退化。在这项工作中，我们提出了一个合作的多代理强化学习系统的正式表述，并带有意外的崩溃。为了增强系统崩溃的鲁棒性，我们提出了一个由教练辅助的多项式增强学习框架，该框架介绍了一个虚拟教练代理，以调整训练期间的崩溃率。我们为我们的教练经纪人设计了三种教练策略和重新采样策略。据我们所知，这项工作是第一个研究多代理系统中意外崩溃的工作。与固定的碰撞率策略和课程学习策略相比，有关网格世界和星际争霸II微管理任务的广泛实验证明了适应性策略的功效。消融研究进一步说明了我们重新采样策略的有效性。

Multi-agent reinforcement learning is difficult to be applied in practice, which is partially due to the gap between the simulated and real-world scenarios. One reason for the gap is that the simulated systems always assume that the agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes will destroy the cooperation among agents, leading to performance degradation. In this work, we present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training. We design three coaching strategies and the re-sampling strategy for our coach agent. To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题