强大的强化学习对风险敏感的线性二次高斯控制

论文标题

强大的强化学习对风险敏感的线性二次高斯控制

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

论文作者

Cui, Leilei, Başar, Tamer, Jiang, Zhong-Ping

论文摘要

本文提出了一个新颖的强大增强学习框架，用于与模型不匹配的离散时间线性系统，这可能是由SIM到真实的差距引起的。一个关键策略是从控制理论中调用高级技术。使用经典风险敏感的线性二次高斯控制的配方，提出了双环策略优化算法来生成强大的最佳控制器。双环策略优化算法被证明是全球和均匀收敛的，并且在学习过程中反对干扰。这种鲁棒性属性称为小扰动输入到国家稳定性，并确保所提出的策略优化算法会收敛到最佳控制器的一个小社区，只要每个学习步骤的干扰相对较小即可。另外，当系统动力学未知时，提出了一种新型的无模型外策略优化算法。最后，提供了数值示例来说明所提出的算法。

This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题