多代理信托区域政策优化

论文标题

多代理信托区域政策优化

Multi-Agent Trust Region Policy Optimization

论文作者

Li, Hepeng, He, Haibo

论文摘要

我们将信任区域政策优化（TRPO）扩展到多机构增强学习（MARL）问题。我们表明，TRPO的策略更新可以转换为多代理案例的分布式共识优化问题。通过对共识优化模型进行一系列近似值，我们提出了一种分散的MARL算法，我们称之为多代理TRPO（MATRPO）。该算法可以根据本地观察和私人奖励优化分布式策略。代理不需要知道其他代理的观察，奖励，政策或价值/行动价值功能。在培训过程中，代理商仅与邻居共享可能性比率。该算法是完全分散的，并且具有隐私性。我们对两个合作游戏的实验表明了其在复杂的MARL任务上的出色表现。

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题