为精确的机器人控制执行轨迹优化与政策学习之间的共识

论文标题

为精确的机器人控制执行轨迹优化与政策学习之间的共识

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

论文作者

Lidec, Quentin Le, Jallet, Wilson, Laptev, Ivan, Schmid, Cordelia, Carpentier, Justin

论文摘要

强化学习（RL）和轨迹优化（TO）具有强大的互补优势。一方面，RL方法能够直接从数据中学习全球控制策略，但通常需要大量样本量以正确地收敛于可行的策略。另一方面，对方法可以利用从模拟器提取的基于梯度的信息，以快速收敛到局部最佳的控制轨迹，该轨迹仅在解决方案附近有效。在过去的十年中，几种方法旨在充分结合两类方法，以获得两全其美的最佳选择。从这一研究中，我们提出了一些改进这些方法的改进，以更快地学习全球控制政策，尤其是通过通过Sobolev学习来利用敏感性信息，并增强了Lagrangian技术来实施与政策学习之间的共识。我们通过与文献中的现有方法进行比较，评估了这些改进对机器人技术各种经典任务的好处。

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题