从游戏到策略：未经切割的机器人数据的有条件行为产生

论文标题

从游戏到策略：未经切割的机器人数据的有条件行为产生

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

论文作者

Cui, Zichen Jeff, Wang, Yibin, Shafiullah, Nur Muhammad Mahi, Pinto, Lerrel

论文摘要

虽然来自离线数据的大规模序列建模导致自然语言和图像产生的令人印象深刻的性能提高，但直接将这种想法转化为机器人技术一直是具有挑战性的。这样做的一个关键原因是，未经精心的机器人演示数据，即从非专家人类示威者收集的播放数据通常是嘈杂的，多样的和分布多模式的。这使得从此类数据中提取有用的以任务为中心的行为成为困难的生成建模问题。在这项工作中，我们提出条件行为变压器（C-BET），该方法结合了行为变压器的多模式生成能力与未来条件条件的目标规范。在一系列模拟的基准任务上，我们发现C-BET在先前从播放数据中学习的最新工作平均提高了45.7％。此外，我们首次证明，只能从播放数据中学习有用的以任务为中心的行为，而无需任何任务标签或奖励信息。机器人视频最好在我们的项目网站上观看：https：//play-to-policy.github.io

While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: https://play-to-policy.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题