论文标题
为假设的视力语言推理任务学习动力效应动力学
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task
论文作者
论文摘要
“行动”在人类与世界互动的方式中起着至关重要的作用。因此,将有助于我们进行日常任务的自主代理也需要能够执行“行动与变革的推理”(RAC)。通常,这是人工智能(AI)的重要研究方向,但是对视觉和语言输入的RAC研究是相对较新的。 CLEVR_HYP(Sampat等人,2021年)就是一种用于假设视觉语言推理的测试,其作用是关键重点。在这项工作中,我们提出了一种新颖的学习策略,可以改善行动影响的推理。我们实施一个编码器架构来学习动作作为向量的表示。我们将上述编码器架构与现有模态解析器和场景图答案模型相结合,以评估我们在CLEVR_HYP数据集上提出的系统。我们进行了彻底的实验,以证明我们提出的方法的有效性,并在性能,数据效率和概括能力方面讨论了其优于以前的基线的优势。
'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat et. al., 2021) is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP dataset. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.