论文标题
通过脱钩的身体和边缘监督改善语义细分
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
论文作者
论文摘要
现有的语义细分方法旨在通过建模全局上下文来改善对象的内部一致性,或者通过多尺度特征融合来详细介绍其界限。在本文中,提出了一种新的语义分割范式。我们的见解是,语义分割的吸引力性能\ textit {显式}对象\ textit {body}和\ textit {edge}建模,这与图像的高频率相对应。为此,我们首先通过学习流场来使对象部分更一致地扭曲图像功能。通过明确采样不同的零件(身体或边缘)像素,在分离的监督下,在脱钩的监督下进一步优化了所得的身体特征和残留边缘特征。我们表明,具有各种基线或骨干网络的提议框架会导致对象内部的一致性和对象边界更好。在四个主要道路场景的语义分割基准上进行了广泛的实验,包括\ textIt {cityScapes},\ textit {camvid},\ textit {kiiti}和\ textit {bdd}表明,我们所提出的拟议方法在提出高效率的同时建立了新的艺术状态。特别是,我们仅使用细通量数据就可以在城市景观上达到83.7 miou \%。代码和模型可用于促进任何进一步的研究(\ url {https://github.com/lxtgh/decouplesegnets})。
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image. To do so, we first warp the image feature by learning a flow field to make the object part more consistent. The resulting body feature and the residual edge feature are further optimized under decoupled supervision by explicitly sampling different parts (body or edge) pixels. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries. Extensive experiments on four major road scene semantic segmentation benchmarks including \textit{Cityscapes}, \textit{CamVid}, \textit{KIITI} and \textit{BDD} show that our proposed approach establishes new state of the art while retaining high efficiency in inference. In particular, we achieve 83.7 mIoU \% on Cityscape with only fine-annotated data. Code and models are made available to foster any further research (\url{https://github.com/lxtGH/DecoupleSegNets}).