通过生成对抗性模仿学习对人类驾驶行为进行建模

论文标题

通过生成对抗性模仿学习对人类驾驶行为进行建模

Modeling Human Driving Behavior through Generative Adversarial Imitation Learning

论文作者

Bhattacharyya, Raunak, Wulfe, Blake, Phillips, Derek, Kuefler, Alex, Morton, Jeremy, Senanayake, Ransalu, Kochenderfer, Mykel

论文摘要

自动车辆安全验证中的一个开放问题是在模拟中构建可靠的人类驾驶行为模型。这项工作提出了一种从现实世界驱动示范数据中学习神经驾驶政策的方法。我们将人类驾驶模型为一个顺序决策问题，其特征是非线性和随机性以及未知的基本成本函数。模仿学习是一种未知或难以指定的成本函数时产生智能行为的方法。在逆增强学习（IRL）的工作基础上，生成的对抗性模仿学习（GAIL）旨在为大型或连续的状态和动作空间（例如对人类驾驶建模的问题）提供有效的模仿。本文介绍了GAIL用于基于学习的驱动程序建模的使用。由于驱动程序建模是本质上的一个多代理问题，需要对代理之间的相互作用进行建模，因此本文描述了Gail的参数共享延伸，称为PS-GAIL，以解决多代理驱动程序的建模。此外，盖尔（Gail）是无知的领域，因此很难编码与学习过程中驾驶有关的特定知识。本文描述了奖励增强模仿学习（RAIL），该学习修改了奖励信号，以向代理提供特定领域的知识。最后，人类的示威取决于盖尔可能无法捕获的潜在因素。本文描述了烧伤，该文章允许在示范中解散潜在可变性。模仿学习实验是使用NGSIM（现实世界中的高速公路驾驶数据集）进行的。实验表明，这些对盖尔的修改可以成功地对高速公路驾驶行为进行建模，准确地复制人类的示范，并在驾驶剂之间的相互作用引起的交通流量中产生现实的，紧急的行为。

An open problem in autonomous vehicle safety validation is building reliable models of human driving behavior in simulation. This work presents an approach to learn neural driving policies from real world driving demonstration data. We model human driving as a sequential decision making problem that is characterized by non-linearity and stochasticity, and unknown underlying cost functions. Imitation learning is an approach for generating intelligent behavior when the cost function is unknown or difficult to specify. Building upon work in inverse reinforcement learning (IRL), Generative Adversarial Imitation Learning (GAIL) aims to provide effective imitation even for problems with large or continuous state and action spaces, such as modeling human driving. This article describes the use of GAIL for learning-based driver modeling. Because driver modeling is inherently a multi-agent problem, where the interaction between agents needs to be modeled, this paper describes a parameter-sharing extension of GAIL called PS-GAIL to tackle multi-agent driver modeling. In addition, GAIL is domain agnostic, making it difficult to encode specific knowledge relevant to driving in the learning process. This paper describes Reward Augmented Imitation Learning (RAIL), which modifies the reward signal to provide domain-specific knowledge to the agent. Finally, human demonstrations are dependent upon latent factors that may not be captured by GAIL. This paper describes Burn-InfoGAIL, which allows for disentanglement of latent variability in demonstrations. Imitation learning experiments are performed using NGSIM, a real-world highway driving dataset. Experiments show that these modifications to GAIL can successfully model highway driving behavior, accurately replicating human demonstrations and generating realistic, emergent behavior in the traffic flow arising from the interaction between driving agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题