CRIPP-VQA：通过视频问题回答有关隐式物理属性的反事实推理

论文标题

CRIPP-VQA：通过视频问题回答有关隐式物理属性的反事实推理

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

论文作者

Patel, Maitreya, Gokhale, Tejas, Baral, Chitta, Yang, Yezhou

论文摘要

视频通常捕获对象，可见属性，运动以及不同对象之间的相互作用。对象还具有物理特性，例如质量，成像管道无法直接捕获。但是，可以通过利用相对对象运动的线索以及碰撞引入的动力学来估算这些属性。在本文中，我们介绍了Cripp-VQA，这是一个新的视频问题，回答数据集，以推理场景中对象的隐式物理属性。 CRIPP-VQA包含运动中的对象的视频，并注释了有关行动影响的反事实推理的问题，有关计划以实现目标的问题以及有关对象可见属性的描述性问题。 CRIPP-VQA测试集可以在几个分发设置下进行评估 - 带有质量的对象，摩擦系数以及在训练分布中未观察到的初始速度的视频。我们的实验在回答有关隐性属性（本文的重点）和对象的明确特性（先前工作的重点）方面揭示了令人惊讶且显着的性能差距。

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings -- videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

下载PDF全文

下载文献需遵守相关版权规定

论文标题