探索性抓取：渐近最佳算法，用于抓住具有挑战性的多面体物体

论文标题

探索性抓取：渐近最佳算法，用于抓住具有挑战性的多面体物体

Exploratory Grasping: Asymptotically Optimal Algorithms for Grasping Challenging Polyhedral Objects

论文作者

Danielczuk, Michael, Balakrishna, Ashwin, Brown, Daniel S., Devgon, Shivin, Goldberg, Ken

论文摘要

最近，关于学习通用掌握策略的数据驱动算法的最新工作。但是，这些政策始终无法掌握具有挑战性的对象，这些物体显着超出了训练数据中对象的分布，或者具有很少的高质量掌握物体。在此类物体的激励下，我们提出了一个新颖的问题设置，探索性的抓握，以通过顺序抓握，释放和倒塌来有效地在未知的多面体对象上发现可靠的抓地力。我们将探索性的握力形式化为马尔可夫决策过程，研究探索性握把的理论复杂性在强化学习的背景下，并提出了有效的强盗式算法，用于在线快速抓取探索策略（Borges）的匪徒，该策略（Borges）利用问题的结构来有效地发现每个对象稳定剂量的较高性能grassps。 Borges可用于补充任何通用握把算法，并使用任何抓握模态（并行jaw，抽吸，多指手指等）来学习对物体表现出持久失败的对象的策略。仿真实验表明，Borges可以显着超过通用握把管道和其他两种在线学习算法，并在1000范围内的最佳策略的5％以内的绩效在1000和8000个时间段内的5％以内，在46个挑战性的物体中，DEX-NET对抗性和EGAD中的46个挑战性对象！对象数据集。最初的物理实验表明，在DEX-NET基线上，Borges可以将成功率提高45％，而在现实世界中仅进行了200次掌握尝试。有关补充材料和视频，请参见https://tinyurl.com/exp-grasping。

There has been significant recent work on data-driven algorithms for learning general-purpose grasping policies. However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps. Motivated by such objects, we propose a novel problem setting, Exploratory Grasping, for efficiently discovering reliable grasps on an unknown polyhedral object via sequential grasping, releasing, and toppling. We formalize Exploratory Grasping as a Markov Decision Process, study the theoretical complexity of Exploratory Grasping in the context of reinforcement learning and present an efficient bandit-style algorithm, Bandits for Online Rapid Grasp Exploration Strategy (BORGES), which leverages the structure of the problem to efficiently discover high performing grasps for each object stable pose. BORGES can be used to complement any general-purpose grasping algorithm with any grasp modality (parallel-jaw, suction, multi-fingered, etc) to learn policies for objects in which they exhibit persistent failures. Simulation experiments suggest that BORGES can significantly outperform both general-purpose grasping pipelines and two other online learning algorithms and achieves performance within 5% of the optimal policy within 1000 and 8000 timesteps on average across 46 challenging objects from the Dex-Net adversarial and EGAD! object datasets, respectively. Initial physical experiments suggest that BORGES can improve grasp success rate by 45% over a Dex-Net baseline with just 200 grasp attempts in the real world. See https://tinyurl.com/exp-grasping for supplementary material and videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题