论文标题
萨利萨:基于显着的输入采样,用于有效的视频对象检测
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection
论文作者
论文摘要
高分辨率图像被广泛用于视频中的高性能对象检测。但是,处理高分辨率输入的计算成本很高,并且对输入的幼稚下采样以降低计算成本迅速降低了检测性能。在本文中,我们提出了一种用于视频对象检测的新型非均匀显着性输入采样技术,可允许对不重要的背景区域进行大量的下采样,同时保留高分辨率图像的细粒细节。所得图像在空间上较小,导致计算成本降低,同时使性能与高分辨率输入相当。为了实现这一目标,我们提出了一个基于薄板样条空间变压器网络(TPS-STN)的可区分再采样模块。该模块通过新颖的损失正规化,以提供明确的监督信号,以学习“放大”显着区域。我们报告了最新的成像vid和ua-detrac视频对象检测数据集的计算机制低的结果。我们证明,在两个数据集上,有效的DET-D1(效率DET-D2)的地图与有效的D2-D2(有效DET-D3)相当,计算成本要低得多。我们还表明,萨利萨(Salisa)显着改善了小物体的检测。特别是,具有有效的D1检测器的Salisa将小物体的检测提高了$ 77 \%$,并且显着胜过效率高的基线。
High-resolution images are widely adopted for high-performance object detection in videos. However, processing high-resolution inputs comes with high computation costs, and naive down-sampling of the input to reduce the computation costs quickly degrades the detection performance. In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection that allows for heavy down-sampling of unimportant background regions while preserving the fine-grained details of a high-resolution image. The resulting image is spatially smaller, leading to reduced computational costs while enabling a performance comparable to a high-resolution input. To achieve this, we propose a differentiable resampling module based on a thin plate spline spatial transformer network (TPS-STN). This module is regularized by a novel loss to provide an explicit supervision signal to learn to "magnify" salient regions. We report state-of-the-art results in the low compute regime on the ImageNet-VID and UA-DETRAC video object detection datasets. We demonstrate that on both datasets, the mAP of an EfficientDet-D1 (EfficientDet-D2) gets on par with EfficientDet-D2 (EfficientDet-D3) at a much lower computational cost. We also show that SALISA significantly improves the detection of small objects. In particular, SALISA with an EfficientDet-D1 detector improves the detection of small objects by $77\%$, and remarkably also outperforms EfficientDetD3 baseline.