NAVREP：无监督的表示机器人在动态人类环境中导航的强化学习

论文标题

NAVREP：无监督的表示机器人在动态人类环境中导航的强化学习

NavRep: Unsupervised Representations for Reinforcement Learning of Robot Navigation in Dynamic Human Environments

论文作者

Dugas, Daniel, Nieto, Juan, Siegwart, Roland, Chung, Jen Jen

论文摘要

机器人导航是一项任务，强化学习方法仍然无法与传统的路径计划竞争。最新的方法在很小的方面有所不同，并且并非全部提供可再现的，公开可用的实现。这使得比较方法成为挑战。最近的研究表明，无监督的学习方法可以令人印象深刻，并可以利用以解决困难问题。在这项工作中，我们设计了可以使用无监督学习来帮助机器人导航的强化学习的方法。在看不见的测试案例中，我们训练两个端到端的两个端到端和18个无监督学习的架构，并将其与现有方法进行比较。我们展示了我们在现实生活机器人上工作的方法。我们的结果表明，无监督的学习方法具有端到端方法的竞争力。我们还强调了各种组成部分的重要性，例如输入表示，无监督的学习和潜在特征。我们将所有模型公开提供，以及培训和测试环境以及工具。该版本还包括旨在模仿其他论文所描述的训练条件的OpenAi-Gym兼容环境，并具有尽可能多的保真度。我们的希望是，这有助于将RL领域用于机器人导航，并可以在最新方法中进行有意义的比较。

Robot navigation is a task where reinforcement learning approaches are still unable to compete with traditional path planning. State-of-the-art methods differ in small ways, and do not all provide reproducible, openly available implementations. This makes comparing methods a challenge. Recent research has shown that unsupervised learning methods can scale impressively, and be leveraged to solve difficult problems. In this work, we design ways in which unsupervised learning can be used to assist reinforcement learning for robot navigation. We train two end-to-end, and 18 unsupervised-learning-based architectures, and compare them, along with existing approaches, in unseen test cases. We demonstrate our approach working on a real life robot. Our results show that unsupervised learning methods are competitive with end-to-end methods. We also highlight the importance of various components such as input representation, predictive unsupervised learning, and latent features. We make all our models publicly available, as well as training and testing environments, and tools. This release also includes OpenAI-gym-compatible environments designed to emulate the training conditions described by other papers, with as much fidelity as possible. Our hope is that this helps in bringing together the field of RL for robot navigation, and allows meaningful comparisons across state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题