论文标题
多代理信托区域政策优化
Multi-Agent Trust Region Policy Optimization
论文作者
论文摘要
我们将信任区域政策优化(TRPO)扩展到多机构增强学习(MARL)问题。我们表明,TRPO的策略更新可以转换为多代理案例的分布式共识优化问题。通过对共识优化模型进行一系列近似值,我们提出了一种分散的MARL算法,我们称之为多代理TRPO(MATRPO)。该算法可以根据本地观察和私人奖励优化分布式策略。代理不需要知道其他代理的观察,奖励,政策或价值/行动价值功能。在培训过程中,代理商仅与邻居共享可能性比率。该算法是完全分散的,并且具有隐私性。我们对两个合作游戏的实验表明了其在复杂的MARL任务上的出色表现。
We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.