论文标题
Alignseg:特征对准的分割网络
AlignSeg: Feature-Aligned Segmentation Networks
论文作者
论文摘要
事实证明,根据不同的卷积块或上下文嵌入的汇总特征已被证明是增强语义分割特征表示的有效方法。但是,大多数当前流行的网络体系结构倾向于忽略由1)逐步下采样操作引起的功能聚合过程中的未对准问题,而2)不加选择的上下文信息融合。在本文中,我们探讨了解决此类特征错位问题的原则,并创造了与特征一致的分割网络(Alignseg)。 Alignseg由两个主要模块组成,即对齐特征聚合(Alignfa)模块和对齐的上下文建模(AlignCM)模块。首先,Alignfa采用了一种简单的可学习插值策略来学习像素的转换偏移,该策略可以有效地缓解由多解决功能聚合引起的特征错位问题。其次,借助上下文嵌入,AlignCM使每个像素能够以自适应方式选择私有的自定义上下文信息,从而使上下文嵌入的嵌入方式更好地对齐以提供适当的指导。我们通过对城市景观和ADE20K进行广泛的实验来验证我们的Alignseg网络的有效性,分别实现了新的最先进的MIOU评分,分别为82.6%和45.95%。我们的源代码将提供。
Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by 1) step-by-step downsampling operations, and 2) indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multiresolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information in an adaptive manner, making the contextual embeddings aligned better to provide appropriate guidance. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively. Our source code will be made available.