经过验证的概率政策，用于深入增强学习

论文标题

经过验证的概率政策，用于深入增强学习

Verified Probabilistic Policies for Deep Reinforcement Learning

论文作者

Bacci, Edoardo, Parker, David

论文摘要

深度强化学习是一种越来越受欢迎的技术，用于综合政策，以控制代理与环境的互动。正式验证此类政策是正确和安全执行的，也越来越兴趣。通过建立现有工作来验证深神经网络和连续状态动态系统的现有工作，在该领域取得了进展。在本文中，我们解决了验证深度强化学习概率政策的问题，这些政策用于解决对抗环境，打破对称性并管理权衡折衷的问题。我们提出了一种基于马尔可夫决策过程的抽象方法，该方法可以通过抽象解释，混合组合线性编程，基于熵的细化和概率模型检查来实现策略执行的概率保证，并目前的技术来构建和解决这些模型。我们实施我们的方法，并说明其在选择强化学习基准的选择中的有效性。

Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment. There is also growing interest in formally verifying that such policies are correct and execute safely. Progress has been made in this area by building on existing work for verification of deep neural networks and of continuous-state dynamical systems. In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs. We propose an abstraction approach, based on interval Markov decision processes, that yields probabilistic guarantees on a policy's execution, and present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking. We implement our approach and illustrate its effectiveness on a selection of reinforcement learning benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题