论文标题
CTD:分散学生的集中教师多代理增强学习
CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning
论文作者
论文摘要
由于许多多代理强化学习(MARL)任务中的部分可观察性和沟通限制,因此通过分散执行(CTDE)的集中培训已成为使用最广泛的MARL范式之一。在CTDE中,集中信息致力于通过混合网络学习团队奖励的分配,而对单个Q值的学习通常基于本地观察结果。全球观察的效用不足将降低具有挑战性的环境中的性能。为此,这项工作提出了一个新颖的集中式老师,该教师由分散的学生(CTD)框架,该框架由教师模型和学生模型组成。具体而言,教师模型通过学习以全球观察为条件的单个Q值来分配团队的奖励,而学生模型则利用部分观察结果来近似由教师模型估计的Q值。通过这种方式,CTD可以平衡培训期间全球观察的全面利用,以及分散执行在线推断的可行性。我们的CTDS框架是通用的,可以在现有的CTDE方法上应用以提高其性能。我们对一组充满挑战的Starcraft II微管理任务进行实验,以测试我们方法的有效性,结果表明CTD的表现优于现有的基于值的MARL方法。
Due to the partial observability and communication constraints in many multi-agent reinforcement learning (MARL) tasks, centralized training with decentralized execution (CTDE) has become one of the most widely used MARL paradigms. In CTDE, centralized information is dedicated to learning the allocation of the team reward with a mixing network, while the learning of individual Q-values is usually based on local observations. The insufficient utility of global observation will degrade performance in challenging environments. To this end, this work proposes a novel Centralized Teacher with Decentralized Student (CTDS) framework, which consists of a teacher model and a student model. Specifically, the teacher model allocates the team reward by learning individual Q-values conditioned on global observation, while the student model utilizes the partial observations to approximate the Q-values estimated by the teacher model. In this way, CTDS balances the full utilization of global observation during training and the feasibility of decentralized execution for online inference. Our CTDS framework is generic which is ready to be applied upon existing CTDE methods to boost their performance. We conduct experiments on a challenging set of StarCraft II micromanagement tasks to test the effectiveness of our method and the results show that CTDS outperforms the existing value-based MARL methods.