论文标题
在不确定性下自适应屏蔽
Adaptive Shielding under Uncertainty
论文作者
论文摘要
本文针对控制问题的控制问题,这些问题表现出特定的安全性和性能要求。特别是,目的是确保在不确定性下运行的代理人将严格遵守此类要求。以前的工作创建了所谓的盾牌,如果代理人要承担难以忍受的安全风险,可以纠正现有的控制器。但是,到目前为止,盾牌还没有考虑到一个环境可能并不完全知道,并且可能会因复杂的控制和学习任务而发展。我们提出了一种新方法,以有效计算盾牌,以适应不断变化的环境。特别是,我们将方法基于通过潜在无限的马尔可夫决策过程(MDP)和定量规格(例如平均收益目标)充分捕获的问题。盾牌独立于控制器,例如,该控制器可能会采用高性能的强化学习代理的形式。在运行时,我们的方法构建了MDP的内部抽象表示,并根据环境的观察值不断调整此抽象和盾牌。我们通过城市交通管制问题展示了我们方法的适用性。
This paper targets control problems that exhibit specific safety and performance requirements. In particular, the aim is to ensure that an agent, operating under uncertainty, will at runtime strictly adhere to such requirements. Previous works create so-called shields that correct an existing controller for the agent if it is about to take unbearable safety risks. However, so far, shields do not consider that an environment may not be fully known in advance and may evolve for complex control and learning tasks. We propose a new method for the efficient computation of a shield that is adaptive to a changing environment. In particular, we base our method on problems that are sufficiently captured by potentially infinite Markov decision processes (MDP) and quantitative specifications such as mean payoff objectives. The shield is independent of the controller, which may, for instance, take the form of a high-performing reinforcement learning agent. At runtime, our method builds an internal abstract representation of the MDP and constantly adapts this abstraction and the shield based on observations from the environment. We showcase the applicability of our method via an urban traffic control problem.