多代理强化学习的变分政策传播

论文标题

多代理强化学习的变分政策传播

Variational Policy Propagation for Multi-agent Reinforcement Learning

论文作者

Qu, Chao, Li, Hui, Liu, Chang, Xiong, Junwu, Zhang, James, Chu, Wei, Wang, Weiqiang, Qi, Yuan, Song, Le

论文摘要

我们提出了一个名为“变化策略传播”（VPP）的多代理增强学习算法的多代理增强学习算法，以通过代理的交互来学习\ emph {intim}策略。我们证明，在某些温和条件下，联合政策是马尔可夫随机领域，这反过来又有效地降低了政策空间。我们将变异推断纳入策略中的特殊可区分层，以便可以从马尔可夫随机字段中有效采样这些动作，并且总体策略是可区分的。我们在几个大规模挑战的任务上评估了我们的算法，并证明它的表现优于先前的最先进。

We propose a \emph{collaborative} multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a \emph{joint} policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such that the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable. We evaluate our algorithm on several large scale challenging tasks and demonstrate that it outperforms previous state-of-the-arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题