论文标题
视觉问题回答先前的课堂语义
Visual Question Answering with Prior Class Semantics
论文作者
论文摘要
我们提出了一种新颖的机制,可以将先验知识嵌入到视觉问题回答的模型中。该任务的开放性质与固定分类器的培训的无处不在方法不一致。我们展示了如何利用与候选人答案的语义有关的其他信息。我们将答案预测过程扩展到语义空间中的回归目标,在该过程中,我们使用从单词嵌入的先验知识进行了候选答案。我们通过GQA数据集对学习的表示形式进行了广泛的研究,揭示了在答案空间中嵌入之间的关系中捕获了重要的语义信息。我们的方法在一系列问题类型上具有一致性和准确性的提高。在训练期间看不见的新答案的实验表明该方法的开放式预测潜力。
We present a novel mechanism to embed prior knowledge in a model for visual question answering. The open-set nature of the task is at odds with the ubiquitous approach of training of a fixed classifier. We show how to exploit additional information pertaining to the semantics of candidate answers. We extend the answer prediction process with a regression objective in a semantic space, in which we project candidate answers using prior knowledge derived from word embeddings. We perform an extensive study of learned representations with the GQA dataset, revealing that important semantic information is captured in the relations between embeddings in the answer space. Our method brings improvements in consistency and accuracy over a range of question types. Experiments with novel answers, unseen during training, indicate the method's potential for open-set prediction.