论文标题

通过评估碰撞风险来增强基于增强学习的动态障碍的方法

Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk

论文作者

Hart, Fabian, Okhrin, Ostap

论文摘要

在自主机器人领域,加固学习(RL)是一种越来越多的方法,可以解决移动机器人,自主船和无人机的动态障碍物的任务。训练这些代理商的一种常见做法是使用训练环境,并随机初始化代理和障碍。这种方法可能会遭受训练中高风险情景的覆盖范围,从而导致避免障碍的最终表现受损。本文提出了一个一般培训环境,我们通过使用短训练发作并评估两个指标的难度来控制避免障碍任务的难度:障碍的数量和碰撞风险度量标准。我们发现将训练转向更大的任务难度可以大大提高最终表现。基线代理使用基于代理和障碍物的随机初始化以及更长的训练发作的传统训练环境,导致性能明显较弱。为了证明拟议方法的普遍性,我们设计了两个现实的用例:在遇到障碍的威胁下,移动机器人和一艘海上船。在这两种应用中,都可以确认先前的结果,这强调了所提出的方法的一般可用性,该方法与特定的应用程序上下文分离,而独立于代理的动力学。我们进一步在传感器信号中添加了高斯噪声,仅导致性能的边缘降解,从而表明训练有素的剂的坚固鲁棒性。

In the field of autonomous robots, reinforcement learning (RL) is an increasingly used method to solve the task of dynamic obstacle avoidance for mobile robots, autonomous ships, and drones. A common practice to train those agents is to use a training environment with random initialization of agent and obstacles. Such approaches might suffer from a low coverage of high-risk scenarios in training, leading to impaired final performance of obstacle avoidance. This paper proposes a general training environment where we gain control over the difficulty of the obstacle avoidance task by using short training episodes and assessing the difficulty by two metrics: The number of obstacles and a collision risk metric. We found that shifting the training towards a greater task difficulty can massively increase the final performance. A baseline agent, using a traditional training environment based on random initialization of agent and obstacles and longer training episodes, leads to a significantly weaker performance. To prove the generalizability of the proposed approach, we designed two realistic use cases: A mobile robot and a maritime ship under the threat of approaching obstacles. In both applications, the previous results can be confirmed, which emphasizes the general usability of the proposed approach, detached from a specific application context and independent of the agent's dynamics. We further added Gaussian noise to the sensor signals, resulting in only a marginal degradation of performance and thus indicating solid robustness of the trained agent.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源