论文标题
在行人在场的情况下,使用深层Q-NETWORK的行人驾驶行为决策
Behavioral decision-making for urban autonomous driving in the presence of pedestrians using Deep Recurrent Q-Network
论文作者
论文摘要
由于道路结构的复杂性以及多样化的道路使用者行为的不确定性,在城市环境中自动驾驶的决策变得具有挑战性。传统方法由手动设计的规则组成,作为驾驶政策,需要专家领域知识,很难概括,并且随着环境变得复杂,可能会产生次优的结果。鉴于,使用加强学习,可以通过与环境的几次互动来自动学习和改进最佳的驾驶政策。但是,在自主驾驶的强化学习领域的当前研究主要集中在公路设置上,而几乎没有强调城市环境。在这项工作中,针对行人在场的城市环境提出了针对高级驾驶行为的深厚学习决策方法。为此,探索了深度复发Q-Network(DRQN)的使用,一种将最先进的深Q-Network(DQN)与长期短期内存(LSTM)层相结合的方法,可帮助代理获得对环境的内存。 3-D状态表示形式被设计为输入以及定义明确的奖励功能,以训练代理商在像Urban Simulator这样的现实世界中学习适当的行为政策。对所提出的方法进行了密集的城市场景评估,并与基于规则的方法进行了比较,结果表明,拟议的基于DRQN的驾驶行为决策者的表现优于基于规则的方法。
Decision making for autonomous driving in urban environments is challenging due to the complexity of the road structure and the uncertainty in the behavior of diverse road users. Traditional methods consist of manually designed rules as the driving policy, which require expert domain knowledge, are difficult to generalize and might give sub-optimal results as the environment gets complex. Whereas, using reinforcement learning, optimal driving policy could be learned and improved automatically through several interactions with the environment. However, current research in the field of reinforcement learning for autonomous driving is mainly focused on highway setup with little to no emphasis on urban environments. In this work, a deep reinforcement learning based decision-making approach for high-level driving behavior is proposed for urban environments in the presence of pedestrians. For this, the use of Deep Recurrent Q-Network (DRQN) is explored, a method combining state-of-the art Deep Q-Network (DQN) with a long term short term memory (LSTM) layer helping the agent gain a memory of the environment. A 3-D state representation is designed as the input combined with a well defined reward function to train the agent for learning an appropriate behavior policy in a real-world like urban simulator. The proposed method is evaluated for dense urban scenarios and compared with a rule-based approach and results show that the proposed DRQN based driving behavior decision maker outperforms the rule-based approach.