精确的单阶段检测器

论文标题

精确的单阶段检测器

Precise Single-stage Detector

论文作者

Chandio, Aisha, Gui, Gong, Kumar, Teerath, Ullah, Irfan, Ranjbarzadeh, Ramin, Roy, Arunabha M, Hussain, Akhtar, Shen, Yao

论文摘要

SDD中仍然存在两个问题，导致一些不准确的结果：（1）在特征提取过程中，随着语义信息的一层获取，局部信息逐渐丢失，导致较少代表性的特征图；（2）由于分类和回归任务的不一致，在非最大抑制（NMS）算法中，分类置信度和预测的检测位置无法准确指示预测框的位置。方法：为了解决这些上述问题，我们提出了一个新体系结构，即单镜头多伯克斯检测器（SSD）的修改版本，名为精确的单阶段检测器（PSSD）。首先，我们通过向SSD添加额外的层来改善功能。其次，我们构建了一个简单有效的功能增强模块，以逐步扩展每个层的接受场，并增强其本地和语义信息。最后，我们设计了一个更有效的损失函数，以预测预测框和地面真相框之间的IOU，以及阈值IOU指导分类培训并减轻NMS算法使用的分数。主要结果：从上述优化中受益，提议的模型PSSD实时实现了令人兴奋的性能。具体而言，使用Titan XP的硬件和320 Pix的输入尺寸，PSSD在MS Coco基准测试上以45 fps速度和81.28 MAP以66 fps速度在Pascal VOC 2007上以45 fps的速度实现33.8地图，超过了先进的先进对象检测模型。此外，提出的模型在较大的输入尺寸方面的性能明显很好。在512 PIX以下，PSSD可以在MS Coco上获得27 fps的37.2 MAP，而Pascal VOC 2007上的40 fps则可以获得27 fps。实验结果证明，拟议的模型在速度和准确性之间具有更好的权衡。

There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题