论文标题
在另一侧找到它:一个适应观点的匹配编码器用于更改字幕
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
论文作者
论文摘要
更改字幕是一项旨在用自然语言来描述图像之间差异的任务。大多数现有方法将此问题视为差异判断,而没有干扰因素的存在,例如观点会改变。但是,实际上,观点变化经常发生,并且会淹没要描述的语义差异。在本文中,我们提出了一个新颖的视觉编码器,以明确区分观点的变化与更改字幕任务的语义变化。此外,我们进一步模拟了人类的注意力偏好,并提出了一种新颖的增强学习过程,以直接通过语言评估奖励微调注意力。广泛的实验结果表明,我们的方法的表现优于最先进的方法,而斑点和CLEVR变换数据集的差距很大。
Change Captioning is a task that aims to describe the difference between images with natural language. Most existing methods treat this problem as a difference judgment without the existence of distractors, such as viewpoint changes. However, in practice, viewpoint changes happen often and can overwhelm the semantic difference to be described. In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task. Moreover, we further simulate the attention preference of humans and propose a novel reinforcement learning process to fine-tune the attention directly with language evaluation rewards. Extensive experimental results show that our method outperforms the state-of-the-art approaches by a large margin in both Spot-the-Diff and CLEVR-Change datasets.