ASE：大规模可重复使用的对抗技巧嵌入物理模拟字符

论文标题

ASE：大规模可重复使用的对抗技巧嵌入物理模拟字符

ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

论文作者

Peng, Xue Bin, Guo, Yunrong, Halper, Lina, Levine, Sergey, Fidler, Sanja

论文摘要

人类展现的运动能力的令人难以置信的壮举是通过多年的练习和经验获得的大量通用运动技能的巨大曲目。这些技能不仅使人类能够执行复杂的任务，而且还为学习新任务时提供了强大的先验来指导其行为。这与基于物理的角色动画中的常见实践形成了鲜明的对比，在物理学的角色动画中，控制策略通常是从头开始训练的。在这项工作中，我们提出了一个大规模的数据驱动框架，用于学习对物理模拟字符的多功能和可重复使用的技能嵌入。我们的方法结合了对抗性模仿学习和无监督的强化学习的技术，以开发产生类似生活的行为的技能嵌入，同时还为新的下游任务提供了易于控制的表示形式。我们的模型可以使用非结构化运动剪辑的大型数据集对我们的模型进行培训，而无需任何特定于任务的注释或分割运动数据。通过利用大量平行的基于GPU的模拟器，我们能够使用十多年的模拟体验来训练技能嵌入，从而使我们的模型能够学习丰富而多功能的技能曲目。我们表明，可以有效地应用单个预训练的模型来执行各种新任务。我们的系统还允许用户通过简单的奖励功能指定任务，然后嵌入的技能使角色能够自动合成复杂和自然主义的策略，以实现任务目标。

The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physics-based character animation, where control policies are most typically trained from scratch for each task. In this work, we present a large-scale data-driven framework for learning versatile and reusable skill embeddings for physically simulated characters. Our approach combines techniques from adversarial imitation learning and unsupervised reinforcement learning to develop skill embeddings that produce life-like behaviors, while also providing an easy to control representation for use on new downstream tasks. Our models can be trained using large datasets of unstructured motion clips, without requiring any task-specific annotation or segmentation of the motion data. By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills. We show that a single pre-trained model can be effectively applied to perform a diverse set of new tasks. Our system also allows users to specify tasks through simple reward functions, and the skill embedding then enables the character to automatically synthesize complex and naturalistic strategies in order to achieve the task objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题