视觉回溯传播：一种用于离线基于图像的强化学习的数据收集协议

论文标题

视觉回溯传播：一种用于离线基于图像的强化学习的数据收集协议

Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

论文作者

Brandfonbrener, David, Tu, Stephen, Singh, Avi, Welker, Stefan, Boodoo, Chad, Matni, Nikolai, Varley, Jake

论文摘要

我们考虑如何最有效地利用遥控器的时间来收集数据，以学习基于图像的价值功能和稀疏奖励机器人任务的策略。为了实现这一目标，我们将数据收集过程修改为包括所需任务的成功演示。取而代之的是，我们开发了一种新颖的协议，该协议称为视觉回溯传播（VBT），该协议故意收集视觉上相似的失败，恢复和成功的数据集。 VBT数据收集对于从基于图像的观测值的小数据集有效地学习准确的价值函数特别有用。我们在真实的机器人上演示了VBT，以从图像观测值进行连续控制，以进行T恤抓握的可变形操作任务。我们发现，通过调整数据收集过程，我们可以比各种基线方法提高学习价值功能和策略的质量。具体而言，我们发现，当从实际机器人中给出了同等尺寸的60分钟数据的数据集时，VBT数据上的离线增强学习超过了成功演示数据的标准行为。

We consider how to most efficiently leverage teleoperator time to collect data for learning robust image-based value functions and policies for sparse reward robotic tasks. To accomplish this goal, we modify the process of data collection to include more than just successful demonstrations of the desired task. Instead we develop a novel protocol that we call Visual Backtracking Teleoperation (VBT), which deliberately collects a dataset of visually similar failures, recoveries, and successes. VBT data collection is particularly useful for efficiently learning accurate value functions from small datasets of image-based observations. We demonstrate VBT on a real robot to perform continuous control from image observations for the deformable manipulation task of T-shirt grasping. We find that by adjusting the data collection process we improve the quality of both the learned value functions and policies over a variety of baseline methods for data collection. Specifically, we find that offline reinforcement learning on VBT data outperforms standard behavior cloning on successful demonstration data by 13% when both methods are given equal-sized datasets of 60 minutes of data from the real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题