论文标题
Bongard-hoi:对人类对象相互作用的基准测试很少
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
论文作者
论文摘要
当今的视觉模式识别模型和人级的视觉认知之间,尤其是在涉及新颖概念的几乎没有的学习和组成推理之间,仍然存在一个显着的差距。我们介绍了Bongard-Hoi,这是一种新的视觉推理基准,该基准着眼于自然图像的人类对象相互作用(HOI)的组成学习。它的灵感来自经典邦加德问题(BPS)的两个理想特征:1)几乎没有概念学习,而2)与上下文相关的推理。我们仔细地策划了几个镜头的实例,在这些实例中,正面和负面图像仅在行动标签上不同意,这仅仅使对象类别的识别不足以完成我们的基准。我们还设计了多个测试集,以系统地研究视觉学习模型的概括,在这种模型中,我们改变了训练和测试集之间的HOI概念的重叠,从部分到没有重叠。 Bongard-Hoi对当今的视觉识别模型提出了重大挑战。最先进的HOI检测模型在少数二进制预测上仅能达到62%的精度,而即使是MTURK上的业余测试仪也具有91%的精度。借助Bongard-hoi基准,我们希望进一步推进视觉推理的研究工作,尤其是在整体感知策划系统和更好的代表性学习方面。
A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. We carefully curate the few-shot instances with hard negatives, where positive and negative images only disagree on action labels, making mere recognition of object categories insufficient to complete our benchmarks. We also design multiple test sets to systematically study the generalization of visual learning models, where we vary the overlap of the HOI concepts between the training and test sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a substantial challenge to today's visual recognition models. The state-of-the-art HOI detection model achieves only 62% accuracy on few-shot binary prediction while even amateur human testers on MTurk have 91% accuracy. With the Bongard-HOI benchmark, we hope to further advance research efforts in visual reasoning, especially in holistic perception-reasoning systems and better representation learning.