样式挂钩的双重一致性学习：视觉域概括的统一框架

论文标题

样式挂钩的双重一致性学习：视觉域概括的统一框架

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

论文作者

Zhao, Yuyang, Zhong, Zhun, Zhao, Na, Sebe, Nicu, Lee, Gim Hee

论文摘要

域的转移广泛存在于视觉世界中，而现代深层神经网络通常由于概括能力差而在域转移下通常会遭受严重的性能下降，这限制了现实世界的应用。域移动主要在于有限的源环境变化以及源和看不见的目标数据之间的较大分布差距。为此，我们提出了一个统一的框架，样式挂钩的双重一致性学习（阴影），以处理各种视觉任务中的域移动。具体而言，阴影是基于两个一致性约束（样式一致性（SC）和回顾一致性（RC）构建的。 SC丰富了来源情况，并鼓励模型在样式多样的样本中学习一致的表示。 RC利用一般的视觉知识来防止模型过度拟合到源数据，因此很大程度上使源和一般视觉模型之间的表示一致。此外，我们提出了一个新颖的样式幻觉模块（SHM），以生成对一致性学习至关重要的样式变化样本。 SHM从源分布中选择基础样式，从而使模型能够在训练过程中动态生成各种和现实的样本。广泛的实验表明，我们的多功能阴影可以显着增强各种视觉识别任务中的概括，包括图像分类，语义分割和对象检测，具有不同的模型，即Convnets和Transformer。

Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to the poor generalization ability, which limits the real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle such domain shift in various visual tasks. Specifically, SHADE is constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages general visual knowledge to prevent the model from overfitting to source data and thus largely keeps the representation consistent between the source and general visual models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning. SHM selects basis styles from the source distribution, enabling the model to dynamically generate diverse and realistic samples during training. Extensive experiments demonstrate that our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection, with different models, i.e., ConvNets and Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题