论文标题

上下文感知的鲁棒性微调

Context-Aware Robust Fine-Tuning

论文作者

Mao, Xiaofeng, Chen, Yuefeng, Jia, Xiaojun, Zhang, Rong, Xue, Hui, Li, Zhao

论文摘要

对比语言图像预训练(剪辑)模型具有通过使用图像和提示句子之间的相似性和“ [class] [class]的[context] [claste] [clastist]的相似性,可以零弹奏的能力,可以对“ [class]”进行分类。基于“ [context]”中详尽的文本提示,剪辑模型知道不同的上下文,例如背景,样式,观点和展示既有稳健性,均针对各种分配变化。但是,最近的著作发现剪辑模型的进一步微调可提高准确性,但牺牲了下游任务的鲁棒性。我们进行了一项实证研究,以表明微调会破坏预训练的剪辑功能的上下文感知能力。为了解决这个问题,我们提出了上下文感知可靠的微调(CAR-FT)。 CAR-FT在微调期间将模型正规化以捕获上下文信息。具体来说,我们使用零射击及重量来获取图像中包含的上下文分布。通过最大程度地减少原始/微调剪辑模型引起的上下文分布之间的kullback-leibler Divergence(KLD),CAR-FT可以使继承到下游任务中的剪辑的上下文感知能力,并实现较高的分布(ID)和过度分布(ID)和过度分布(OOD)精度。实验结果表明,CAR-FT在ImageNet的五个OOD测试数据集上实现了出色的鲁棒性,同时为九个下游任务带来了准确的提高。此外,CAR-FT超过了先前的域概括(DG)方法,并在域基准测试中获得78.5%的平均精度,建立了新的最新技术。

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源