因果推理符合视觉表示学习：一项前瞻性研究

论文标题

因果推理符合视觉表示学习：一项前瞻性研究

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

论文作者

Liu, Yang, Wei, Yushen, Yan, Hong, Li, Guanbin, Lin, Liang

论文摘要

在各种现实世界中，视觉表示学习无处不在，包括视觉理解，视频理解，多模式分析，人类计算机的互动和城市计算。由于出现了大量多模式的异质空间/时间/时空数据，因此在大数据时代，缺乏可解释性，鲁棒性和分布外的概括正在成为现有视觉模型的挑战。大多数现有方法倾向于符合原始数据/可变分布，而忽略了多模式知识背后的基本因果关系，而多模式知识缺乏统一的指导和分析，即为什么现代视觉表示学习方法很容易崩溃成数据偏见，并且具有有限的概括和认知能力。因此，受到人类水平代理的强大推理能力的启发，近年来，在开发因果推理范式方面付出了巨大的努力，以良好的认知能力实现强大的代表和模型学习。在本文中，我们对视觉表示学习的现有因果推理方法进行了全面审查，涵盖了基本理论，模型和数据集。还讨论了当前方法和数据集的局限性。此外，我们提出了一些潜在的挑战，机遇和未来的研究方向，用于基准视觉表示学习中的因果推理算法。本文旨在对这个新兴领域进行全面概述，引起人们的注意，鼓励讨论，使发展新颖的因果推理方法，公开可用的基准测试和共识建设标准，以提高可靠的视觉表示学习和相关的现实现实应用程序的紧迫性。

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题