使用在线加固学习对多代理系统的层次结构控制

论文标题

使用在线加固学习对多代理系统的层次结构控制

Hierarchical Control of Multi-Agent Systems using Online Reinforcement Learning

论文作者

Bai, He, George, Jemin, Chakrabortty, Aranya

论文摘要

我们建议一种基于增强学习的新方法，用于设计层次线性二次调节器（LQR）控制器，用于具有未知状态空间模型和分离的控制目标的异质线性多机构系统。分离是由于将代理分组为多个非重叠组，并将控制目标定义为两个不同的目标。第一个目标旨在最大程度地减少群体构造小组级任务的群块分节化的LQR函数。另一方面，第二个目标试图最大程度地减少组平均状态（质心）之间的LQR函数。利用这种分离，我们重新定义了LQR功能的加权矩阵，使我们能够将其各自的代数riccati方程解除。此后，我们制定了一种增强学习策略，该策略使用代理状态和普通状态的在线测量，根据近似Riccati方程来学习各自的控制器。由于第一个控制器被块排列化，因此可以并行学习，而第二个控制器由于平均而降低了维度，因此与集中的增强学习相比，总体设计的学习时间大大减少。

We propose a new reinforcement learning based approach to designing hierarchical linear quadratic regulator (LQR) controllers for heterogeneous linear multi-agent systems with unknown state-space models and separated control objectives. The separation arises from grouping the agents into multiple non-overlapping groups, and defining the control goal as two distinct objectives. The first objective aims to minimize a group-wise block-decentralized LQR function that models group-level mission. The second objective, on the other hand, tries to minimize an LQR function between the average states (centroids) of the groups. Exploiting this separation, we redefine the weighting matrices of the LQR functions in a way that they allow us to decouple their respective algebraic Riccati equations. Thereafter, we develop a reinforcement learning strategy that uses online measurements of the agent states and the average states to learn the respective controllers based on the approximate Riccati equations. Since the first controller is block-decentralized and, therefore, can be learned in parallel, while the second controller is reduced-dimensional due to averaging, the overall design enjoys a significantly reduced learning time compared to centralized reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题