论文标题
在行人场景中,具有上下文授权的视觉注意力预测
Context-empowered Visual Attention Prediction in Pedestrian Scenarios
论文作者
论文摘要
对于必须在不同的紧迫性和安全偏好条件下必须导航到所需目标的行人来说,视觉注意力的有效分配是关键。虽然对行人注意力的自动建模具有巨大的希望,可以改善行人行为的模拟,但当前的显着性预测方法主要集中于通用的自由观看场景,并且不能反映行人注意力预测中带来的具体挑战。在本文中,我们提出了上下文 - 盐网络,这是一种新颖的编码器架构,该体系结构明确解决了行人视觉注意力预测的三个关键挑战:首先,上下文 - 钙板明确地模拟了上下文因素的紧迫性和编码器模型潜在空间的安全性偏好。其次,我们提出了指数加权的均方误差损失(EW-MSE),能够更好地应对以下事实:只有一小部分地面真相显着图由非零条目组成。第三,我们明确对认知不确定性进行了模型,以说明这一事实是,对行人注意力预测的训练数据有限。为了评估上下文静音,我们记录了VR中的第一个行人视觉关注数据集,其中包括上下文因素的紧迫性和安全性偏好的明确变化。上下文添加剂对最先进的显着性预测方法以及消融方面有了明显的改进。我们的新型数据集将完全可用,并可以作为对行人注意力预测的进一步研究的宝贵资源。
Effective and flexible allocation of visual attention is key for pedestrians who have to navigate to a desired goal under different conditions of urgency and safety preferences. While automatic modelling of pedestrian attention holds great promise to improve simulations of pedestrian behavior, current saliency prediction approaches mostly focus on generic free-viewing scenarios and do not reflect the specific challenges present in pedestrian attention prediction. In this paper, we present Context-SalNET, a novel encoder-decoder architecture that explicitly addresses three key challenges of visual attention prediction in pedestrians: First, Context-SalNET explicitly models the context factors urgency and safety preference in the latent space of the encoder-decoder model. Second, we propose the exponentially weighted mean squared error loss (ew-MSE) that is able to better cope with the fact that only a small part of the ground truth saliency maps consist of non-zero entries. Third, we explicitly model epistemic uncertainty to account for the fact that training data for pedestrian attention prediction is limited. To evaluate Context-SalNET, we recorded the first dataset of pedestrian visual attention in VR that includes explicit variation of the context factors urgency and safety preference. Context-SalNET achieves clear improvements over state-of-the-art saliency prediction approaches as well as over ablations. Our novel dataset will be made fully available and can serve as a valuable resource for further research on pedestrian attention prediction.