加固学习任务的可组合规范语言

论文标题

加固学习任务的可组合规范语言

A Composable Specification Language for Reinforcement Learning Tasks

论文作者

Jothimurugan, Kishor, Alur, Rajeev, Bastani, Osbert

论文摘要

强化学习是一种学习机器人任务控制政策的有前途的方法。但是，指定复杂的任务（例如，具有多个目标和安全约束）可能具有挑战性，因为用户必须设计一个编码整个任务的奖励功能。此外，用户通常需要手动塑造奖励，以确保学习算法的融合。我们建议一种用于指定复杂控制任务的语言，以及将我们语言中规范编译为奖励功能的算法，并自动执行奖励成型。我们在一种名为Spectrl的工具中实现了我们的方法，并表明它的表现优于几个最先进的基线。

Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题