与普通噪声的平均野战马尔可夫决策过程的混乱定量传播

论文标题

与普通噪声的平均野战马尔可夫决策过程的混乱定量传播

Quantitative propagation of chaos for mean field Markov decision process with common noise

论文作者

Motte, Médéric, Pham, Huyên

论文摘要

我们研究了混乱的平均野马马尔可夫决策过程（CMKV-MDP）的繁殖，以及在无限层上的随机开环控制上进行优化时。我们首先说明$ m_n^γ$的收敛速率，其中$ m_n $是瓦斯施泰因距离的平均收敛速率，经验度量的$γ\ in（0,1] $是一个明确的常数，在$ n $ agagent控制问题的限制下，在$ n $ agagent控制问题的限制下，与无效的Open-loop Controls to the whow to to to to to to to to to to to to to to fortials fortialscm。 explicitly construct $(ε+\mathcal{O}(M_N^γ))$-optimal policies for the $N$-agent model from $ε$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.

We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^γ$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $γ\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(ε+\mathcal{O}(M_N^γ))$-optimal policies for the $N$-agent model from $ε$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题