论文标题
通过学徒学习和深入的强化学习来学习
Learning to drive via Apprenticeship Learning and Deep Reinforcement Learning
论文作者
论文摘要
随着强化学习(RL)算法的实施,当前的最新自动驾驶汽车技术有可能接近完全自动化。但是,大多数应用程序仅限于游戏域或离散的动作空间,这些空间远非现实世界的驾驶。此外,很难调整奖励机制的参数,因为不同用户之间的驾驶风格差异很大。例如,积极进取的驾驶员可能更喜欢高速加速的驾驶,而某些保守的驾驶员则喜欢更安全的驾驶风格。因此,我们提出了一项学徒学习,并结合深厚的加强学习方法,使代理商可以通过持续的行动学习驾驶和停止行为。我们使用梯度逆增强学习(女孩)算法来恢复未知的奖励功能并采用加强以及深层确定性策略梯度算法(DDPG)来学习最佳策略。在基于仿真的场景中评估了我们方法的性能,结果表明,训练后的某些方面,代理在某些方面表现更好。
With the implementation of reinforcement learning (RL) algorithms, current state-of-art autonomous vehicle technology have the potential to get closer to full automation. However, most of the applications have been limited to game domains or discrete action space which are far from the real world driving. Moreover, it is very tough to tune the parameters of reward mechanism since the driving styles vary a lot among the different users. For instance, an aggressive driver may prefer driving with high acceleration whereas some conservative drivers prefer a safer driving style. Therefore, we propose an apprenticeship learning in combination with deep reinforcement learning approach that allows the agent to learn the driving and stopping behaviors with continuous actions. We use gradient inverse reinforcement learning (GIRL) algorithm to recover the unknown reward function and employ REINFORCE as well as Deep Deterministic Policy Gradient algorithm (DDPG) to learn the optimal policy. The performance of our method is evaluated in simulation-based scenario and the results demonstrate that the agent performs human like driving and even better in some aspects after training.