论文标题
G-RCN:优化对象检测的分类和本地化任务之间的差距
G-RCN: Optimizing the Gap between Classification and Localization Tasks for Object Detection
论文作者
论文摘要
多任务学习被广泛用于计算机视觉中。当前,对象检测模型利用共享的功能映射同时完成分类和本地化任务。通过比较原始的更快的R-CNN与部分分离的特征图之间的性能,我们表明:(1)共享分类和本地化任务的高级特征是次优的; (2)大步很大,对分类有益,但对本地化有害; (3)全球上下文信息可以改善分类的性能。基于这些发现,我们提出了一个称为基于GAP优化区域的卷积网络(G-RCN)的范式,该范式旨在将这两个任务分开并优化它们之间的差距。首先应用范式来纠正当前的RESNET协议,通过简单地减少步幅并将Cons5块从头部移至特征提取网络,从而使Pascal VOC数据集对AP70的改进3.6改进Resnet50上的COCO数据集对AP上的AP改进。接下来,使用VGG16,RESNET50和RESNET101的骨干链链将新方法应用于更快的R-CNN上,在Pascal VOC数据集上,AP70的改进高于2.0,可可数据集对AP的AP改进。值得注意的是,G-RCN的实现仅涉及一些结构修改,而没有添加额外的模块。
Multi-task learning is widely used in computer vision. Currently, object detection models utilize shared feature map to complete classification and localization tasks simultaneously. By comparing the performance between the original Faster R-CNN and that with partially separated feature maps, we show that: (1) Sharing high-level features for the classification and localization tasks is sub-optimal; (2) Large stride is beneficial for classification but harmful for localization; (3) Global context information could improve the performance of classification. Based on these findings, we proposed a paradigm called Gap-optimized region based convolutional network (G-RCN), which aims to separating these two tasks and optimizing the gap between them. The paradigm was firstly applied to correct the current ResNet protocol by simply reducing the stride and moving the Conv5 block from the head to the feature extraction network, which brings 3.6 improvement of AP70 on the PASCAL VOC dataset and 1.5 improvement of AP on the COCO dataset for ResNet50. Next, the new method is applied on the Faster R-CNN with backbone of VGG16,ResNet50 and ResNet101, which brings above 2.0 improvement of AP70 on the PASCAL VOC dataset and above 1.9 improvement of AP on the COCO dataset. Noticeably, the implementation of G-RCN only involves a few structural modifications, with no extra module added.