弱监督语义细分的双重渐进转换

论文标题

弱监督语义细分的双重渐进转换

Dual Progressive Transformations for Weakly Supervised Semantic Segmentation

论文作者

Huo, Dongjian, Su, Yukun, Wu, Qingyao

论文摘要

旨在通过仅使用类级标签来挖掘对象区域的弱监督语义细分（WSSS）是计算机视觉中的一项挑战。当前的最新基于CNN的方法通常采用类激活图（CAM）来突出物体的潜在区域，但是，它们可能会遭受部分激活的问题。为此，我们尝试尽早尝试探索WSSS任务中Vision Transformer的全球功能注意力机制。但是，由于变压器与CNN模型中缺乏电感偏差，因此无法直接提高性能，并且可能产生过度激活的问题。为了解决这些缺点，我们提出了一个卷积神经网络完善的变压器（CRT），以在本文中挖掘全球完整且局部准确的类激活图。为了验证我们提出的方法的有效性，在Pascal VOC 2012和CUB-200-2011数据集上进行了广泛的实验。实验评估表明，我们提出的CRT在弱监督的语义细分任务上实现了新的最新性能，这是弱监督的对象本地化任务，这是其他其他人的优于其他差距。

Weakly supervised semantic segmentation (WSSS), which aims to mine the object regions by merely using class-level labels, is a challenging task in computer vision. The current state-of-the-art CNN-based methods usually adopt Class-Activation-Maps (CAMs) to highlight the potential areas of the object, however, they may suffer from the part-activated issues. To this end, we try an early attempt to explore the global feature attention mechanism of vision transformer in WSSS task. However, since the transformer lacks the inductive bias as in CNN models, it can not boost the performance directly and may yield the over-activated problems. To tackle these drawbacks, we propose a Convolutional Neural Networks Refined Transformer (CRT) to mine a globally complete and locally accurate class activation maps in this paper. To validate the effectiveness of our proposed method, extensive experiments are conducted on PASCAL VOC 2012 and CUB-200-2011 datasets. Experimental evaluations show that our proposed CRT achieves the new state-of-the-art performance on both the weakly supervised semantic segmentation task the weakly supervised object localization task, which outperform others by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题