具有分层质量多样性的实体机器人的在线损害恢复

论文标题

具有分层质量多样性的实体机器人的在线损害恢复

Online Damage Recovery for Physical Robots with Hierarchical Quality-Diversity

论文作者

Allard, Maxime, Smith, Simón C., Chatzilygeroudis, Konstantinos, Lim, Bryan, Cully, Antoine

论文摘要

在实际环境中，机器人需要抵御损害和强大的损害，以实现不可预见的情况。质量多样性（QD）算法已成功地用于在几秒钟内通过利用各种各样的学习技能来适应机器人。高度多样性的技能增加了机器人在克服新情况下成功的机会，因为有更多的潜在替代方法可以解决新任务。但是，发现并存储了多种技能的大量行为多样性，通常会导致计算复杂性的提高。此外，在大型技能空间中规划的机器人计划是增加技能数量的另一个挑战。分层结构可以通过将技能分解为原始技能来帮助降低这种搜索和存储复杂性。在本文中，我们介绍了层次试验和错误算法，该算法使用层次的行为曲目来学习多样化的技能，并利用它们使机器人在物理世界中迅速适应。我们表明，技能的层次结构分解使机器人能够学习更多复杂的行为，同时保持曲目的学习。使用Hexapod机器人进行的实验表明，我们的方法解决了一个迷宫导航任务，模拟动作少20％，而在物理世界中的动作却少43％，而在最具挑战性的情况下，与最好的基线相比，最具挑战性的情况，而完全失败的失败则少78％。

In real-world environments, robots need to be resilient to damages and robust to unforeseen scenarios. Quality-Diversity (QD) algorithms have been successfully used to make robots adapt to damages in seconds by leveraging a diverse set of learned skills. A high diversity of skills increases the chances of a robot to succeed at overcoming new situations since there are more potential alternatives to solve a new task.However, finding and storing a large behavioural diversity of multiple skills often leads to an increase in computational complexity. Furthermore, robot planning in a large skill space is an additional challenge that arises with an increased number of skills. Hierarchical structures can help reducing this search and storage complexity by breaking down skills into primitive skills. In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. Experiments with a hexapod robot show that our method solves a maze navigation tasks with 20% less actions in simulation, and 43% less actions in the physical world, for the most challenging scenarios than the best baselines while having 78% less complete failures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题