非平稳野战游戏的无模型强化学习

论文标题

非平稳野战游戏的无模型强化学习

Model-free Reinforcement Learning for Non-stationary Mean Field Games

论文作者

Mishra, Rajesh K, Vasal, Deepanshu, Vishwanath, Sriram

论文摘要

在本文中，我们考虑了有限的地平线，非平稳，平均野外游戏（MFG），其中大量均质玩家，依次做出战略决策，每个玩家都会通过称为平均场地状态的总人群状态受到其他玩家的影响。每个玩家都有一个只能观察到的私人类型，而平均野外人口状态代表其他玩家类型的经验分布，这在所有玩家中共享。最近，[1]中的作者提供了一种顺序分解算法来计算此类游戏的平均场平衡（MFE），该游戏允许在线性时间内比指数相比，在线性时间内为它们计算平衡策略。在本文中，我们将其扩展为未知状态过渡的情况，以一种基于预期的SARSA的强化学习算法，采用一种政策梯度方法，通过同时学习游戏的动态来学习MFE政策。我们使用网络物理安全示例说明了结果。

In this paper, we consider a finite horizon, non-stationary, mean field games (MFG) with a large population of homogeneous players, sequentially making strategic decisions, where each player is affected by other players through an aggregate population state termed as mean field state. Each player has a private type that only it can observe, and a mean field population state representing the empirical distribution of other players' types, which is shared among all of them. Recently, authors in [1] provided a sequential decomposition algorithm to compute mean field equilibrium (MFE) for such games which allows for the computation of equilibrium policies for them in linear time than exponential, as before. In this paper, we extend it for the case when state transitions are not known, to propose a reinforcement learning algorithm based on Expected Sarsa with a policy gradient approach that learns the MFE policy by learning the dynamics of the game simultaneously. We illustrate our results using cyber-physical security example.

下载PDF全文

下载文献需遵守相关版权规定

论文标题