上下文感知的鲁棒性微调

论文标题

上下文感知的鲁棒性微调

Context-Aware Robust Fine-Tuning

论文作者

Mao, Xiaofeng, Chen, Yuefeng, Jia, Xiaojun, Zhang, Rong, Xue, Hui, Li, Zhao

论文摘要

对比语言图像预训练（剪辑）模型具有通过使用图像和提示句子之间的相似性和“ [class] [class]的[context] [claste] [clastist]的相似性，可以零弹奏的能力，可以对“ [class]”进行分类。基于“ [context]”中详尽的文本提示，剪辑模型知道不同的上下文，例如背景，样式，观点和展示既有稳健性，均针对各种分配变化。但是，最近的著作发现剪辑模型的进一步微调可提高准确性，但牺牲了下游任务的鲁棒性。我们进行了一项实证研究，以表明微调会破坏预训练的剪辑功能的上下文感知能力。为了解决这个问题，我们提出了上下文感知可靠的微调（CAR-FT）。 CAR-FT在微调期间将模型正规化以捕获上下文信息。具体来说，我们使用零射击及重量来获取图像中包含的上下文分布。通过最大程度地减少原始/微调剪辑模型引起的上下文分布之间的kullback-leibler Divergence（KLD），CAR-FT可以使继承到下游任务中的剪辑的上下文感知能力，并实现较高的分布（ID）和过度分布（ID）和过度分布（OOD）精度。实验结果表明，CAR-FT在ImageNet的五个OOD测试数据集上实现了出色的鲁棒性，同时为九个下游任务带来了准确的提高。此外，CAR-FT超过了先前的域概括（DG）方法，并在域基准测试中获得78.5％的平均精度，建立了新的最新技术。

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题