论文标题
图像分类,对象检测和跟踪的扫视机制
Saccade Mechanisms for Image Classification, Object Detection and Tracking
论文作者
论文摘要
我们研究了如何使用来自生物视觉的扫视机制来使深层神经网络更有效地分类和对象检测问题。我们提出的方法是基于注意力驱动的视觉处理和扫视的思想,由注意力影响的微型眼动。我们通过分析进行实验:i)不同的深神经网络(DNN)特征提取器的鲁棒性对部分感应图像进行图像分类和对象检测,以及ii)acccadades在掩蔽图像贴片中的效用,用于图像分类和对象跟踪。在几个数据集(CIFAR-10,Davsod,Mscoco和Mot17)上进行了卷积网(RESNET-18)和基于变压器的模型(VIT,DETR,TRANSTRACK)的实验。我们的实验显示了通过学习与最先进的DNN一起用于分类,检测和跟踪任务时模仿人类扫视的智能数据。我们观察到分类和检测任务的性能下降最少,而仅使用约30 \%的原始传感器数据。我们讨论扫视机制如何通过``像素''处理来为硬件设计提供信息。
We examine how the saccade mechanism from biological vision can be used to make deep neural networks more efficient for classification and object detection problems. Our proposed approach is based on the ideas of attention-driven visual processing and saccades, miniature eye movements influenced by attention. We conduct experiments by analyzing: i) the robustness of different deep neural network (DNN) feature extractors to partially-sensed images for image classification and object detection, and ii) the utility of saccades in masking image patches for image classification and object tracking. Experiments with convolutional nets (ResNet-18) and transformer-based models (ViT, DETR, TransTrack) are conducted on several datasets (CIFAR-10, DAVSOD, MSCOCO, and MOT17). Our experiments show intelligent data reduction via learning to mimic human saccades when used in conjunction with state-of-the-art DNNs for classification, detection, and tracking tasks. We observed minimal drop in performance for the classification and detection tasks while only using about 30\% of the original sensor data. We discuss how the saccade mechanism can inform hardware design via ``in-pixel'' processing.