视觉注意力预测改善了自动无人机赛车代理的性能

论文标题

视觉注意力预测改善了自动无人机赛车代理的性能

Visual Attention Prediction Improves Performance of Autonomous Drone Racing Agents

论文作者

Pfeiffer, Christian, Wengeler, Simon, Loquercio, Antonio, Scaramuzza, Davide

论文摘要

人类赛车无人机比训练端到端自动飞行的神经网络更快。这可能与人类飞行员有效选择与任务相关的视觉信息的能力有关。这项工作调查了能够模仿人眼凝视行为和注意力的神经网络是否可以改善神经网络的性能，从而为基于视觉的自主无人机赛车的挑战性任务而挑战。我们假设基于凝视的注意力预测可能是基于模拟器的无人机赛车任务中视觉信息选择和决策的有效机制。我们使用目光注视和飞行轨迹数据从18个人类无人机飞行员进行训练的视觉注意预测模型来检验这一假设。然后，我们使用这种视觉注意力预测模型来训练端到端控制器，使用模仿学习，以基于视觉的自动无人机赛车。我们将注意力预测控制器与使用原始图像输入和基于图像的摘要（即功能轨道）的无人机赛车性能进行比较。比较通过自动飞行完成挑战性赛车赛道的成功率，我们的结果表明，基于注意力预测的控制器（成功率为88％）的表现优于RGB形象（成功率为61％）和功能轨道（成功率为55％的成功率）控制器控制器基础。此外，在保持参考轨迹上评估时，视觉注意预测和基于特征轨道的模型比基于图像的模型显示出更好的概括性能。我们的结果表明，人类的视觉关注预测提高了基于自主视觉的无人机赛车代理的性能，并为基于视觉，快速和敏捷的自主飞行提供了重要的一步，最终可以达到甚至超过人类的表现。

Humans race drones faster than neural networks trained for end-to-end autonomous flight. This may be related to the ability of human pilots to select task-relevant visual information effectively. This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural network performance for the challenging task of vision-based autonomous drone racing. We hypothesize that gaze-based attention prediction can be an efficient mechanism for visual information selection and decision making in a simulator-based drone racing task. We test this hypothesis using eye gaze and flight trajectory data from 18 human drone pilots to train a visual attention prediction model. We then use this visual attention prediction model to train an end-to-end controller for vision-based autonomous drone racing using imitation learning. We compare the drone racing performance of the attention-prediction controller to those using raw image inputs and image-based abstractions (i.e., feature tracks). Comparing success rates for completing a challenging race track by autonomous flight, our results show that the attention-prediction based controller (88% success rate) outperforms the RGB-image (61% success rate) and feature-tracks (55% success rate) controller baselines. Furthermore, visual attention-prediction and feature-track based models showed better generalization performance than image-based models when evaluated on hold-out reference trajectories. Our results demonstrate that human visual attention prediction improves the performance of autonomous vision-based drone racing agents and provides an essential step towards vision-based, fast, and agile autonomous flight that eventually can reach and even exceed human performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题