指导性的安全射击：基于模型的加固学习和安全限制

论文标题

指导性的安全射击：基于模型的加固学习和安全限制

Guided Safe Shooting: model based reinforcement learning with safety constraints

论文作者

Paolo, Giuseppe, Gonzalez-Billandon, Jonas, Thomas, Albert, Kégl, Balázs

论文摘要

在过去的十年中，强化学习成功地解决了复杂的控制任务和决策问题，例如Go棋盘游戏。但是，在将这些算法部署到现实世界情景方面的成功案例很少。原因之一是在处理和避免不安全状态时缺乏保证，这是关键控制工程系统的基本要求。在本文中，我们介绍了带有指导性的安全射击（GUS），这是一种基于模型的RL方法，可以学会控制对安全限制的最小侵犯系统。该模型以迭代的批处理方式了解了系统在系统操作过程中收集的数据，然后用于计划在每个时间步骤执行的最佳动作。我们提出了三个不同的安全计划者，一个基于简单的随机拍摄策略，两个基于Map-elites，一种更高级的发散搜索算法。实验表明，这些计划者可以帮助学习代理避免不安全情况，同时最大程度地探索状态空间，这是学习系统准确模型的必要方面。此外，与无模型方法相比，学习模型可以允许望尔斯减少与现实系统的交互作用的数量，同时仍达到高奖励，这是处理工程系统时的基本要求。

In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints. The model is learned on the data collected during the operation of the system in an iterated batch fashion, and is then used to plan for the best action to perform at each time step. We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm. Experiments show that these planners help the learning agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system. Furthermore, compared to model-free approaches, learning a model allows GuSS reducing the number of interactions with the real-system while still reaching high rewards, a fundamental requirement when handling engineering systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题