论文标题
及时调整梯度以及时调整
Prompt-aligned Gradient for Prompt Tuning
论文作者
论文摘要
得益于剪辑等大型预训练的视觉模型(VLM),我们可以通过“提示”来制作零摄像机分类器,例如,可以使用VLM提供的VLM提供的相似性度量与图像和提示句子“ [class a class a [class]]的提示句子之间的相似性度量可以获得图像的置信度得分“ [class]””。因此,如果我们微调基于及时的相似性度量,提示会显示出VLMS快速适应下游任务的巨大潜力。但是,我们发现常见的失败是,不当微调不仅会破坏与任务相关类别的提示固有的预测,还会破坏VLM词汇中其他类别的固有预测。现有方法仍然通过使用传统的反拟合技术(例如早期停止和数据增强)来解决此问题,这些技术缺乏特定于提示的原则解决方案。我们提出了及时的梯度,以列为Prograd,以防止迅速调整忘记从VLM中学到的一般知识。特别是,Prograd仅将其梯度与“一般方向”对齐(或不集中)的提示更新为“一般方向”,该提示表示为预定的提示预测的KL损失的梯度。广泛的实验表明,在最新的及时调整方法上,弟弟的概括能力更强。代码可在https://github.com/beierzhu/prompt-align中找到。
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]". Therefore, prompt shows a great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the prompt-based similarity measure. However, we find a common failure that improper fine-tuning may not only undermine the prompt's inherent prediction for the task-related classes, but also for other classes in the VLM vocabulary. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompt. We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction. Extensive experiments demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes are available at https://github.com/BeierZhu/Prompt-align.