从受限制的专家演示中学习软限制

论文标题

从受限制的专家演示中学习软限制

Learning Soft Constraints From Constrained Expert Demonstrations

论文作者

Gaurav, Ashish, Rezaee, Kasra, Liu, Guiliang, Poupart, Pascal

论文摘要

逆增强学习（IRL）方法假设专家数据是由优化某些奖励功能的代理生成的。但是，在许多情况下，代理可以优化受某些约束的奖励功能，在这种情况下，约束会诱导行为，而这些行为可能很难仅通过奖励功能表达。我们考虑给出奖励函数的设置，并且限制未知，并提出了一种能够从专家数据中令人满意地恢复这些约束的方法。尽管以前的工作重点是恢复硬性约束，但我们的方法可以恢复代理平均每集满足的累积软约束。以IRL的方式，我们的方法通过通过约束优化过程迭代地调整约束函数来解决此问题，直到代理行为与专家行为相匹配。我们展示了我们在合成环境，机器人环境和现实世界高速公路驾驶场景方面的方法。

Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function. However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function. We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data. While previous work has focused on recovering hard constraints, our method can recover cumulative soft constraints that the agent satisfies on average per episode. In IRL fashion, our method solves this problem by adjusting the constraint function iteratively through a constrained optimization procedure, until the agent behavior matches the expert behavior. We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题