F2A2：灵活的全截面近十大近似演员 - 合作多机构增强学习

论文标题

F2A2：灵活的全截面近十大近似演员 - 合作多机构增强学习

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

论文作者

Li, Wenhao, Jin, Bo, Wang, Xiangfeng, Yan, Junchi, Zha, Hongyuan

论文摘要

由于代理之间的非交互性，维数和计算复杂性的诅咒，传统的集中式多代理增强学习（MARL）算法有时在复杂的应用中是不可行的。因此，几种分散的MALL算法是动机的。但是，现有的分散方法仅处理完全合作的环境，其中需要在培训中传输大量信息。他们用于连续独立参与者的块坐标梯度下降方案，评论家步骤可以简化计算，但会导致严重的偏见。在本文中，我们提出了一个灵活的完全分散的参与者批评的MARL框架，该框架可以结合大多数参与者 - 批评方法，并处理大规模的通用合作多代理设置。原始的双重混合梯度下降类型算法框架旨在分别学习单个代理以进行分散。从每个代理商的角度来看，政策的改进和价值评估都是共同优化的，可以稳定多代理政策学习。此外，我们的框架可以通过参数共享机制以及基于杂志和在线监督的学习理论来实现大规模环境的可扩展性和稳定性，并通过参数共享机制以及一种新颖的建模方法来减少信息传输。合作多代理环境和星际争霸II的足够实验表明，我们的分散MARL实例化算法对常规集中式和分散的方法进行了竞争性的竞争性。

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题