论文标题
加固学习任务的可组合规范语言
A Composable Specification Language for Reinforcement Learning Tasks
论文作者
论文摘要
强化学习是一种学习机器人任务控制政策的有前途的方法。但是,指定复杂的任务(例如,具有多个目标和安全约束)可能具有挑战性,因为用户必须设计一个编码整个任务的奖励功能。此外,用户通常需要手动塑造奖励,以确保学习算法的融合。我们建议一种用于指定复杂控制任务的语言,以及将我们语言中规范编译为奖励功能的算法,并自动执行奖励成型。我们在一种名为Spectrl的工具中实现了我们的方法,并表明它的表现优于几个最先进的基线。
Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.