揭示4D神经场的闭塞

论文标题

揭示4D神经场的闭塞

Revealing Occlusions with 4D Neural Fields

论文作者

Van Hoorick, Basile, Tendulkar, Purva, Suris, Didac, Park, Dennis, Stent, Simon, Vondrick, Carl

论文摘要

为了使计算机视觉系统在动态情况下运行，他们需要能够代表和推理对象持久性。我们介绍了一个学习框架，以从单眼RGB-D估算4D视觉表示，即使它们被遮挡受到阻塞，它也能够持续存在。与传统的视频表示不同，我们将点云编码为连续的表示，这使该模型可以在时空上下文中参加以解决闭塞。在我们与本文一起发布的两个大型视频数据集上，我们的实验表明，该表示形式能够成功揭示几个任务的遮挡，而没有任何架构更改。可视化表明，注意机制会自动学习遵循封闭的对象。由于我们的方法可以端对端训练并且容易适应，因此我们认为这对于处理许多视频理解任务中的闭塞将很有用。数据，代码和模型可在https://occlusions.cs.columbia.edu/上找到。

For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence. We introduce a framework for learning to estimate 4D visual representations from monocular RGB-D, which is able to persist objects, even once they become obstructed by occlusions. Unlike traditional video representations, we encode point clouds into a continuous representation, which permits the model to attend across the spatiotemporal context to resolve occlusions. On two large video datasets that we release along with this paper, our experiments show that the representation is able to successfully reveal occlusions for several tasks, without any architectural changes. Visualizations show that the attention mechanism automatically learns to follow occluded objects. Since our approach can be trained end-to-end and is easily adaptable, we believe it will be useful for handling occlusions in many video understanding tasks. Data, code, and models are available at https://occlusions.cs.columbia.edu/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题