分布式Q学习，具有多代理网络控制的状态跟踪

论文标题

分布式Q学习，具有多代理网络控制的状态跟踪

Distributed Q-Learning with State Tracking for Multi-agent Networked Control

论文作者

Wang, Hang, Lin, Sen, Jafarkhani, Hamid, Zhang, Junshan

论文摘要

本文研究了多代理网络中的线性二次调节器（LQR）的Q学习。现有结果通常假定代理可以观察到全球系统状态，由于隐私问题或通信限制，在大规模系统中可能是不可行的。在这项工作中，我们考虑一个具有未知系统模型的设置，没有集中式协调员。我们设计了一种基于状态跟踪（ST）的Q学习算法来设计代理的最佳控制器。具体而言，我们假设代理商根据其本地信息和与邻居的沟通来维持全球状态的本地估计。在每个步骤中，每个代理都会根据政策迭代在本地解决近似Q-因子的本地全局估计。假设在政策评估过程中注入了腐烂的注入激发噪声，我们证明本地估计收敛到真正的全球状态，并确定拟议的基于ST的Q学习算法的融合。实验研究证实了我们的理论结果，表明我们提出的方法与集中式案例达到了可比的性能。

This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题