通过计划和知识整合对工业机器人任务的基于技能的多目标强化学习

论文标题

通过计划和知识整合对工业机器人任务的基于技能的多目标强化学习

Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration

论文作者

Mayr, Matthias, Ahmad, Faseeh, Chatzilygeroudis, Konstantinos, Nardi, Luigi, Krueger, Volker

论文摘要

在具有小批量的现代工业环境中，为新任务设置机器人系统应该很容易。存在策略，例如技能的使用，但是在处理力量和扭矩方面，这些系统通常会缺乏。我们介绍了一种方法，该方法提供了任务级计划以及针对基于技能系统的方案参数的有针对性学习的组合。我们提出以下管道：（1）用户在计划语言PDDL中提供了一个任务目标，（2）生成计划（即，技能序列），并自动确定了技能的可学习参数。然后，操作员为学习过程选择（3）奖励功能和超参数。我们方法的两个方面至关重要：（a）学习与知识框架紧密整合，以支持符号计划并提供学习先验，（b）使用多目标优化。这可以帮助平衡关键绩效指标（KPI），例如安全性和任务绩效，因为它们通常会互相影响。我们采用多目标贝叶斯优化方法，并在模拟中完全学习。我们通过学习两个不同的接触式任务来证明方法的功效和多功能性。我们展示了他们在真正的7-DOF kuka-iiwa操纵器上成功执行的，并优于人类机器人操作员手动参数化。

In modern industrial settings with small batch sizes it should be easy to set up a robot system for a new task. Strategies exist, e.g. the use of skills, but when it comes to handling forces and torques, these systems often fall short. We introduce an approach that provides a combination of task-level planning with targeted learning of scenario-specific parameters for skill-based systems. We propose the following pipeline: (1) the user provides a task goal in the planning language PDDL, (2) a plan (i.e., a sequence of skills) is generated and the learnable parameters of the skills are automatically identified. An operator then chooses (3) reward functions and hyperparameters for the learning process. Two aspects of our methodology are critical: (a) learning is tightly integrated with a knowledge framework to support symbolic planning and to provide priors for learning, (b) using multi-objective optimization. This can help to balance key performance indicators (KPIs) such as safety and task performance since they can often affect each other. We adopt a multi-objective Bayesian optimization approach and learn entirely in simulation. We demonstrate the efficacy and versatility of our approach by learning skill parameters for two different contact-rich tasks. We show their successful execution on a real 7-DOF KUKA-iiwa manipulator and outperform the manual parameterization by human robot operators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题