基于漏斗的奖励成型，用于增强学习中的信号时间逻辑任务

论文标题

基于漏斗的奖励成型，用于增强学习中的信号时间逻辑任务

Funnel-based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning

论文作者

Saxena, Naman, Sandeep, Gorantla, Jagtap, Pushpak

论文摘要

信号时间逻辑（STL）是描述动态系统的复杂时间和逻辑行为的强大框架。许多研究试图采用强化学习来学习执行STL规范的控制器。但是，他们无法有效地应对确保在连续状态空间和保持障碍的良好满意度的挑战。在本文中，利用漏斗功能的概念，我们提出了一种可拖动的强化学习算法，以学习一项与时间相关的政策，以在连续状态空间中对STL规范的强劲满意。我们使用不同的环境证明了在多个STL任务上的方法的实用性。

Signal Temporal Logic (STL) is a powerful framework for describing the complex temporal and logical behaviour of the dynamical system. Numerous studies have attempted to employ reinforcement learning to learn a controller that enforces STL specifications; however, they have been unable to effectively tackle the challenges of ensuring robust satisfaction in continuous state space and maintaining tractability. In this paper, leveraging the concept of funnel functions, we propose a tractable reinforcement learning algorithm to learn a time-dependent policy for robust satisfaction of STL specification in continuous state space. We demonstrate the utility of our approach on several STL tasks using different environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题