论文标题
通过多模式分析的多视图对比学习改善模式表示
Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis
论文作者
论文摘要
模态表示学习是多模式情感分析(MSA)的重要问题,因为高度可区分的表示可以有助于改善分析效果。 MSA的先前工作通常集中在多模式融合策略上,对模态表示学习的深入研究被较少关注。最近,对比度学习已被确认有效地赋予学说的表述具有更强的歧视能力。受此启发的启发,我们探讨了本研究中与对比度学习的模态表示的改进方法。为此,我们设计了一个三阶段的框架,并具有多视图对比学习,以完善特定目标的表示形式。在第一阶段,为了改善单峰表示,我们采用了有监督的对比度学习,将同一类中的样本拉在一起,而其他样本则被推开。在第二阶段,自我监督的对比学习是为了改善跨模式相互作用后蒸馏的单峰表示的设计。最后,我们再次利用受监督的对比学习来增强融合的多模式表示。在所有对比培训之后,我们接下来基于冷冻表示完成了分类任务。我们在三个开放数据集上进行实验,结果显示了模型的进步。
Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused on multimodal fusion strategies, and the deep study of modal representation learning was given less attention. Recently, contrastive learning has been confirmed effective at endowing the learned representation with stronger discriminate ability. Inspired by this, we explore the improvement approaches of modality representation with contrastive learning in this study. To this end, we devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. At the first stage, for the improvement of unimodal representations, we employ the supervised contrastive learning to pull samples within the same class together while the other samples are pushed apart. At the second stage, a self-supervised contrastive learning is designed for the improvement of the distilled unimodal representations after cross-modal interaction. At last, we leverage again the supervised contrastive learning to enhance the fused multimodal representation. After all the contrast trainings, we next achieve the classification task based on frozen representations. We conduct experiments on three open datasets, and results show the advance of our model.