显着检测的金字塔关注

论文标题

显着检测的金字塔关注

Pyramidal Attention for Saliency Detection

论文作者

Hussain, Tanveer, Anwar, Abbas, Anwar, Saeed, Petersson, Lars, Baik, Sung Wook

论文摘要

显着对象检测（SOD）从输入图像中提取有意义的内容。基于RGB的SOD方法缺乏互补的深度线索；因此，为复杂方案提供有限的性能。同样，RGB-D模型处理RGB和深度输入，但是测试过程中的深度数据可用性可能会阻碍该模型的实际适用性。本文仅利用RGB图像，RGB的估算深度，并利用中间深度特征。我们采用锥体注意力结构来提取多级卷积变换器特征来处理初始阶段表示并进一步增强后续阶段。在每个阶段，骨干变压器模型都会产生全球接收场和计算，以实现我们的残留卷积注意解码器的精细颗粒全局预测，以实现最佳显着性预测。我们报告了分别针对八个RGB和RGB-D数据集的21和40个最先进的SOD方法的性能大大提高。因此，我们提出了一种新的SOD观点，即在训练和测试过程中未能获取深度数据，并协助使用深度线索提高性能的RGB方法。代码和训练有素的模型可在https://github.com/tanveer-hussain/efficientsod2上找到

Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model's practical applicability. This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations and further enhance the subsequent ones. At each stage, the backbone transformer model produces global receptive fields and computing in parallel to attain fine-grained global predictions refined by our residual convolutional attention decoder for optimal saliency prediction. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets, respectively. Consequently, we present a new SOD perspective of generating RGB-D SOD without acquiring depth data during training and testing and assist RGB methods with depth clues for improved performance. The code and trained models are available at https://github.com/tanveer-hussain/EfficientSOD2

下载PDF全文

下载文献需遵守相关版权规定

论文标题