CQ-VQA：对分类问题的视觉问题回答

论文标题

CQ-VQA：对分类问题的视觉问题回答

CQ-VQA: Visual Question Answering on Categorized Questions

论文作者

Mishra, Aakansha, Anand, Ashish, Guha, Prithwijit

论文摘要

本文提出了CQ-VQA，这是一种新颖的2级层次结构但端到端模型，以解决视觉问题回答的任务（VQA）。 CQ-VQA的第一级（称为问题分类器（QC））对问题进行了分类，以减少潜在的答案搜索空间。质量控制使用了参与和融合的输入问题和图像的功能。第二级，称为答案预测变量（AP），包括一组与每个问题类别相对应的不同分类器。根据QC预测的问题类别，AP的分类器中只有一个保持活跃。 QC和AP的损失函数共同汇总，使其成为端到端模型。提出的模型（CQ-VQA）在TDIUC数据集上进行评估，并根据最新方法进行基准测试。结果表明CQ-VQA的竞争性或更好的性能。

This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA). The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space. The QC uses attended and fused features of the input question and image. The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category. Depending on the question category predicted by QC, only one of the classifiers of AP remains active. The loss functions of QC and AP are aggregated together to make it an end-to-end model. The proposed model (CQ-VQA) is evaluated on the TDIUC dataset and is benchmarked against state-of-the-art approaches. Results indicate competitive or better performance of CQ-VQA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题