论文标题

朝着强大的视觉问题回答:通过对比学习充分利用偏见的样本

Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

论文作者

Si, Qingyi, Liu, Yuanxin, Meng, Fandong, Lin, Zheng, Fu, Peng, Cao, Yanan, Wang, Weiping, Zhou, Jie

论文摘要

视觉问题回答(VQA)的模型通常依赖于伪造的相关性,即语言先验,这些相关性出现在有偏见的训练集中,这使得它们使它们反对分布式分布(OOD)测试数据变得脆弱。最近的方法通过减少样本对模型训练的影响来克服这个问题,取得了有希望的进步。但是,这些模型揭示了一个权衡,即对OOD数据的改进严重牺牲了分布(ID)数据(由偏见的样本主导)的性能。因此,我们提出了一种新颖的对比学习方法MMB,用于通过充分利用偏见的样本来构建强大的VQA模型。具体而言,我们通过消除与原始培训样本中的虚假相关性相关的信息来构建积极的样本,以进行对比学习,并探索几种使用构造的阳性样本进行培训的策略。我们的方法并没有破坏偏见样本在模型培训中的重要性,而是精确利用了有偏见的样本来获得有助于推理的无偏见。所提出的方法与各种VQA骨架兼容。我们通过在OOD数据集VQA-CP V2上实现竞争性能来验证我们的贡献,同时在ID数据集VQA V2上保留了稳健的性能。

Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use the constructed positive samples for training. Instead of undermining the importance of biased samples in model training, our approach precisely exploits the biased samples for unbiased information that contributes to reasoning. The proposed method is compatible with various VQA backbones. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源