连续空间系统的正式策略综合通过加固学习

论文标题

连续空间系统的正式策略综合通过加固学习

Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning

论文作者

Kazemi, Milad, Soudjani, Sadegh

论文摘要

本文研究了时间特性对具有连续状态空间的未知随机过程的满意度。我们展示了如何仅使用随机过程的路径来计算有限记忆和确定性的计算策略。我们解决了在线性时间逻辑（LTL）中表达的属性，并使用其自动机表示，从而通过RL算法最大程度地提出了路径依赖的奖励函数。我们开发了所需的假设和理论，即在连续的状态空间中，学到的政策融合了最佳政策。为了提高学习对构建的稀疏奖励函数的性能，我们提出了一个基于从LTL规范的正态正常形式获得的一系列标记函数的顺序学习过程。我们使用此过程来指导RL算法朝着在该过程中适当假设下融合到最佳策略的策略。我们演示了4迪姆卡车孔系统和6迪姆船驾驶问题的方法。

This paper studies satisfaction of temporal properties on unknown stochastic processes that have continuous state spaces. We show how reinforcement learning (RL) can be applied for computing policies that are finite-memory and deterministic using only the paths of the stochastic process. We address properties expressed in linear temporal logic (LTL) and use their automaton representation to give a path-dependent reward function maximised via the RL algorithm. We develop the required assumptions and theories for the convergence of the learned policy to the optimal policy in the continuous state space. To improve the performance of the learning on the constructed sparse reward function, we propose a sequential learning procedure based on a sequence of labelling functions obtained from the positive normal form of the LTL specification. We use this procedure to guide the RL algorithm towards a policy that converges to an optimal policy under suitable assumptions on the process. We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题