论文标题
机器人技术中的互动模仿学习:一项调查
Interactive Imitation Learning in Robotics: A Survey
论文作者
论文摘要
互动模仿学习(IIL)是模仿学习(IL)的一个分支,在机器人执行过程中,间歇性地提供了人类反馈,从而可以在线改善机器人的行为。近年来,IIL越来越多地开始开拓自己的空间,作为解决复杂机器人任务的有希望的数据驱动的替代方案。 IIL的优势是其数据效率,因为人类的反馈直接指导机器人采取改善的行为,并且其稳健性,因为通过直接提供有关学习者的轨迹的反馈,可以最大程度地减少教师和学习者轨迹之间的分布不匹配。然而,尽管有机会出现,但文献中的术语,结构和适用性尚不清楚或统一,从而减慢了其发展,因此对创新的表述和发现进行了研究。在本文中,我们试图通过对统一和结构的领域进行调查来促进新从业人员的IIL研究和较低的入境障碍。此外,我们旨在提高人们对其潜力,已完成的工作以及仍在开放的研究问题的认识。我们根据人类机器人的互动(即反馈类型),界面(即提供反馈的方式),学习(即从反馈和功能近似器中学到的模型),用户体验(即人类对学习过程的人类看法),学习和基础标记来组织IIL中最相关的作品。此外,我们分析了IIL和RL之间的相似性和差异,并讨论了如何脱机,在线,非政策和政策学习如何从RL文献中转移到IIL。我们特别关注现实世界中的机器人应用,并讨论它们的含义,局限性以及有希望的未来研究领域。
Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL) where human feedback is provided intermittently during robot execution allowing an online improvement of the robot's behavior. In recent years, IIL has increasingly started to carve out its own space as a promising data-driven alternative for solving complex robotic tasks. The advantages of IIL are its data-efficient, as the human feedback guides the robot directly towards an improved behavior, and its robustness, as the distribution mismatch between the teacher and learner trajectories is minimized by providing feedback directly over the learner's trajectories. Nevertheless, despite the opportunities that IIL presents, its terminology, structure, and applicability are not clear nor unified in the literature, slowing down its development and, therefore, the research of innovative formulations and discoveries. In this article, we attempt to facilitate research in IIL and lower entry barriers for new practitioners by providing a survey of the field that unifies and structures it. In addition, we aim to raise awareness of its potential, what has been accomplished and what are still open research questions. We organize the most relevant works in IIL in terms of human-robot interaction (i.e., types of feedback), interfaces (i.e., means of providing feedback), learning (i.e., models learned from feedback and function approximators), user experience (i.e., human perception about the learning process), applications, and benchmarks. Furthermore, we analyze similarities and differences between IIL and RL, providing a discussion on how the concepts offline, online, off-policy and on-policy learning should be transferred to IIL from the RL literature. We particularly focus on robotic applications in the real world and discuss their implications, limitations, and promising future areas of research.