SIM2SIM评估新型数据效率可区分的物理引擎，用于紧张的机器人

论文标题

SIM2SIM评估新型数据效率可区分的物理引擎，用于紧张的机器人

Sim2Sim Evaluation of a Novel Data-Efficient Differentiable Physics Engine for Tensegrity Robots

论文作者

Wang, Kun, Aanjaneya, Mridul, Bekris, Kostas

论文摘要

模拟中的学习政策有望在训练机器人控制器时减少人类的努力。对于更自适应和安全但也更难准确建模和控制的软机器人尤其如此。 SIM2REAL间隙是成功将策略从模拟转移到真实机器人的主要障碍。可以应用系统识别来减少此差距，但是传统的识别方法需要大量的手动调整。数据驱动的替代方案可以直接从数据中调整动力学模型，但通常是饥饿的数据，这也融合了人类在收集数据中的努力。这项工作提出了一个由数据驱动的，端到端可区分的模拟器，重点是张力式机器人的激动人心但具有挑战性的领域。据作者所知，这是支持电缆，联系和驱动建模的第一个可区分物理引擎。目的是开发一个合理的简化，数据驱动的模拟，该模拟可以通过有限的地面真实数据来学习近似动态。动力学必须足够准确，以生成可以转移回基地系统的策略。作为朝这个方向发展的第一步，当前的工作证明了SIM2SIM转移，其中未知的Mujoco物理模型充当地面真实系统。两种不同的张力机器人用于评估和学习运动策略，一个6杆和3杆张力。结果表明，当使用可区分的引擎用于直接在地面真相系统上训练政策时，仅需要0.25％的地面真相数据来培训在地面真相系统上有效的政策。

Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors' knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25\% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题