学会在具有互动能力的行人富裕环境中进行社会导航

论文标题

学会在具有互动能力的行人富裕环境中进行社会导航

Learning to Socially Navigate in Pedestrian-rich Environments with Interaction Capacity

论文作者

Qiu, Quecheng, Yao, Shunyi, Wang, Jing, Ma, Jun, Chen, Guangda, Ji, Jianmin

论文摘要

自动机器人的现有导航政策倾向于将重点放在避免碰撞的同时，而忽略了社交生活中的人类机器人相互作用。例如，如果行人注意到，机器人可以沿着走廊更安全，更容易。声音被认为是吸引行人注意力的有效方法，这可以减轻机器人问题。在这项工作中，我们提出了一种新的深入强化学习（DRL）的社会导航方法，以便自主机器人在具有相互作用能力的行人丰富的环境中移动。大多数现有的基于DRL的方法旨在培训一项一般政策，该策略既输出导航动作，即预期的机器人的线性和角速度，以及相互作用动作，即在强化学习的背景下，即蜂鸣器动作。与这些方法不同，我们打算通过监督学习和强化学习来训练该政策。具体而言，我们首先在监督学习的背景下培训了一项互动政策，该政策可以更好地理解社会状况，然后我们使用此互动政策通过多个强化学习算法来训练导航政策。我们在各种模拟环境中评估我们的方法，并将其与其他方法进行比较。实验结果表明，我们的方法在成功率方面优于他人。我们还将经过训练的政策部署在现实世界中的机器人上，该机器人在拥挤的环境中表现出色。

Existing navigation policies for autonomous robots tend to focus on collision avoidance while ignoring human-robot interactions in social life. For instance, robots can pass along the corridor safer and easier if pedestrians notice them. Sounds have been considered as an efficient way to attract the attention of pedestrians, which can alleviate the freezing robot problem. In this work, we present a new deep reinforcement learning (DRL) based social navigation approach for autonomous robots to move in pedestrian-rich environments with interaction capacity. Most existing DRL based methods intend to train a general policy that outputs both navigation actions, i.e., expected robot's linear and angular velocities, and interaction actions, i.e., the beep action, in the context of reinforcement learning. Different from these methods, we intend to train the policy via both supervised learning and reinforcement learning. In specific, we first train an interaction policy in the context of supervised learning, which provides a better understanding of the social situation, then we use this interaction policy to train the navigation policy via multiple reinforcement learning algorithms. We evaluate our approach in various simulation environments and compare it to other methods. The experimental results show that our approach outperforms others in terms of the success rate. We also deploy the trained policy on a real-world robot, which shows a nice performance in crowded environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题