论文标题
强大的强化学习对风险敏感的线性二次高斯控制
Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control
论文作者
论文摘要
本文提出了一个新颖的强大增强学习框架,用于与模型不匹配的离散时间线性系统,这可能是由SIM到真实的差距引起的。一个关键策略是从控制理论中调用高级技术。使用经典风险敏感的线性二次高斯控制的配方,提出了双环策略优化算法来生成强大的最佳控制器。双环策略优化算法被证明是全球和均匀收敛的,并且在学习过程中反对干扰。这种鲁棒性属性称为小扰动输入到国家稳定性,并确保所提出的策略优化算法会收敛到最佳控制器的一个小社区,只要每个学习步骤的干扰相对较小即可。另外,当系统动力学未知时,提出了一种新型的无模型外策略优化算法。最后,提供了数值示例来说明所提出的算法。
This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.