为假设的视力语言推理任务学习动力效应动力学

论文标题

为假设的视力语言推理任务学习动力效应动力学

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

论文作者

Sampat, Shailaja Keyur, Banerjee, Pratyay, Yang, Yezhou, Baral, Chitta

论文摘要

“行动”在人类与世界互动的方式中起着至关重要的作用。因此，将有助于我们进行日常任务的自主代理也需要能够执行“行动与变革的推理”（RAC）。通常，这是人工智能（AI）的重要研究方向，但是对视觉和语言输入的RAC研究是相对较新的。 CLEVR_HYP（Sampat等人，2021年）就是一种用于假设视觉语言推理的测试，其作用是关键重点。在这项工作中，我们提出了一种新颖的学习策略，可以改善行动影响的推理。我们实施一个编码器架构来学习动作作为向量的表示。我们将上述编码器架构与现有模态解析器和场景图答案模型相结合，以评估我们在CLEVR_HYP数据集上提出的系统。我们进行了彻底的实验，以证明我们提出的方法的有效性，并在性能，数据效率和概括能力方面讨论了其优于以前的基线的优势。

'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat et. al., 2021) is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP dataset. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题