分布正规化的自我监督学习，以适应语义细分的域

论文标题

分布正规化的自我监督学习，以适应语义细分的域

Distribution Regularized Self-Supervised Learning for Domain Adaptation of Semantic Segmentation

论文作者

Iqbal, Javed, Rawal, Hamza, Hafiz, Rehan, Chi, Yu-Tseh, Ali, Mohsen

论文摘要

本文提出了一种新颖的像素级分布正则化方案（DRSL），用于自我监督的语义分割域的适应性。在典型的环境中，分类损失迫使语义分割模型贪婪地学习捕获类间变化的表示形式，以确定决策（类）边界。由于域的转移，该决策边界在目标域中不结盟不一致，从而导致嘈杂的伪标签对自我监督的域的适应性产生不利影响。为了克服这一限制，以及捕获阶层间变化，我们通过班级感知的多模式分布学习（MMDL）捕获了像素级内的类内变化。因此，捕获阶层内变化所需的信息与阶层间歧视所需的信息明确分开。因此，捕获的功能更具信息性，导致伪噪声低的伪标记。这种解开使我们能够使用基于跨凝结的自学习对前者进行判别空间和多模式分配空间进行单独的比对。稍后，我们通过明确降低映射到同一模式的目标和源像素之间的距离来提出一种新型的随机模式比对方法。通过伪标签计算的距离度量学习损失，并从多模式建模头部反向传播，是与分割头共享的基本网络上的正常化程序。关于合成到真实域的适应设置的全面实验的结果，即GTA-V/Synthia to CityScapes，表明DRSL的表现优于许多现有方法（MIOU的最小余量为MIOU，用于合成的MIOU到CityScapes）。

This paper proposes a novel pixel-level distribution regularization scheme (DRSL) for self-supervised domain adaptation of semantic segmentation. In a typical setting, the classification loss forces the semantic segmentation model to greedily learn the representations that capture inter-class variations in order to determine the decision (class) boundary. Due to the domain shift, this decision boundary is unaligned in the target domain, resulting in noisy pseudo labels adversely affecting self-supervised domain adaptation. To overcome this limitation, along with capturing inter-class variation, we capture pixel-level intra-class variations through class-aware multi-modal distribution learning (MMDL). Thus, the information necessary for capturing the intra-class variations is explicitly disentangled from the information necessary for inter-class discrimination. Features captured thus are much more informative, resulting in pseudo-labels with low noise. This disentanglement allows us to perform separate alignments in discriminative space and multi-modal distribution space, using cross-entropy based self-learning for the former. For later, we propose a novel stochastic mode alignment method, by explicitly decreasing the distance between the target and source pixels that map to the same mode. The distance metric learning loss, computed over pseudo-labels and backpropagated from multi-modal modeling head, acts as the regularizer over the base network shared with the segmentation head. The results from comprehensive experiments on synthetic to real domain adaptation setups, i.e., GTA-V/SYNTHIA to Cityscapes, show that DRSL outperforms many existing approaches (a minimum margin of 2.3% and 2.5% in mIoU for SYNTHIA to Cityscapes).

下载PDF全文

下载文献需遵守相关版权规定

论文标题