验证基于学习的机器人导航系统

论文标题

验证基于学习的机器人导航系统

Verifying Learning-Based Robotic Navigation Systems

论文作者

Amir, Guy, Corsi, Davide, Yerushalmi, Raz, Marzari, Luca, Harel, David, Farinelli, Alessandro, Katz, Guy

论文摘要

深度强化学习（DRL）已成为在反应性系统中学习复杂政策的任务的主要深入学习范式。不幸的是，已知这些政策容易受到错误的影响。尽管在DNN验证方面取得了重大进展，但很少有工作证明在现实世界中，DRL控制的系统上使用了现代验证工具。在此案例研究中，我们试图开始弥合这一差距，并专注于无地图机器人导航的重要任务 - 一个经典的机器人问题，其中通常由DRL代理控制的机器人需要有效，安全地通过未知的竞技场航行到目标。我们演示了如何将现代验证引擎用于有效的模型选择，即，从候选策略库中选择有关机器人的最佳策略。具体而言，我们使用验证来检测和排除可能表现出次优行为的政策，例如碰撞和无限循环。我们还采用验证来识别具有过度保守行为的模型，从而使用户可以选择出色的策略，这可能会更好地找到目标的较短途径。为了验证我们的工作，我们对实际机器人进行了广泛的实验，并确认我们方法检测到的次优政策确实存在缺陷。我们还证明了验证驱动的方法优于最先进的梯度攻击。我们的工作是第一个确定DNN验证在识别和过滤次优的DRL策略中的实用性，我们相信此处介绍的方法适用于包含基于深度学习代理的广泛系统。

Deep reinforcement learning (DRL) has become a dominant deep-learning paradigm for tasks where complex policies are learned within reactive systems. Unfortunately, these policies are known to be susceptible to bugs. Despite significant progress in DNN verification, there has been little work demonstrating the use of modern verification tools on real-world, DRL-controlled systems. In this case study, we attempt to begin bridging this gap, and focus on the important task of mapless robotic navigation -- a classic robotics problem, in which a robot, usually controlled by a DRL agent, needs to efficiently and safely navigate through an unknown arena towards a target. We demonstrate how modern verification engines can be used for effective model selection, i.e., selecting the best available policy for the robot in question from a pool of candidate policies. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior, such as collisions and infinite loops. We also apply verification to identify models with overly conservative behavior, thus allowing users to choose superior policies, which might be better at finding shorter paths to a target. To validate our work, we conducted extensive experiments on an actual robot, and confirmed that the suboptimal policies detected by our method were indeed flawed. We also demonstrate the superiority of our verification-driven approach over state-of-the-art, gradient attacks. Our work is the first to establish the usefulness of DNN verification in identifying and filtering out suboptimal DRL policies in real-world robots, and we believe that the methods presented here are applicable to a wide range of systems that incorporate deep-learning-based agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题