论文标题

Markov游戏中具有生成模型

Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

论文作者

Li, Gen, Chi, Yuejie, Wei, Yuting, Chen, Yuxin

论文摘要

本文在马尔可夫游戏中研究了多方强化学习,其目的是学习NASH平衡或粗糙相关的平衡(CCE)样本。所有先前的结果至少都有两个障碍中的至少一个:多种剂的诅咒和长范围的障碍,无论使用采样方案如何。假设访问灵活的采样机制:生成模型,我们朝着解决此问题迈出了一步。专注于非平稳的有限马可匹子马尔可夫游戏,我们开发了一种名为\ myalg〜的快速学习算法和一种自适应抽样方案,该方案利用在线对抗性学习中的乐观原则(尤其是按照规范化的领导者(FTRL)方法)。我们的算法使用$$ \ widetilde {o} \ bigG(\ frac {\ frac {h^4 s \ sum_ {i = 1}^m a_i}^m a_i} { $ h $的状态数量是地平线,$ a_i $表示$ i $ th播放器的动作数量。当固定玩家的数量时,这是最小的最佳选择(直至对数因子)。当应用于两人零和马尔可夫游戏时,我们的算法可以发现$ \ varepsilon $ approximate nash equilibrium具有最小的样本。在此过程中,我们得出了一个精致的遗憾,依靠FTRL,这明确表明了差异数量的作用,这可能具有独立的利益。

This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called \myalg~and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an $\varepsilon$-approximate CCE in a general-sum Markov game using $$ \widetilde{O}\bigg( \frac{H^4 S \sum_{i=1}^m A_i}{\varepsilon^2} \bigg) $$ samples, where $m$ is the number of players, $S$ indicates the number of states, $H$ is the horizon, and $A_i$ denotes the number of actions for the $i$-th player. This is minimax-optimal (up to log factor) when the number of players is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an $\varepsilon$-approximate Nash equilibrium with minimal samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源