论文标题

它存在于哪里:多式句子的时空视频接地

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

论文作者

Zhang, Zhu, Zhao, Zhou, Zhao, Yang, Wang, Qi, Liu, Huasheng, Gao, Lianli

论文摘要

在本文中,我们考虑了多种句子(STVG)的新任务,时空视频接地。鉴于一个未修剪的视频和描述对象的声明性/疑问句,STVG旨在定位查询对象的时空管。 STVG具有两个具有挑战性的设置:(1)我们需要从未修剪的视频中定位时空对象管,其中该对象可能仅存在于视频的一小部分中; (2)我们处理多格式句子,包括带有明确对象的声明句子和带有未知对象的疑问句子。由于导管预先发电和缺乏对象关系建模,现有方法无法应对STVG任务。因此,我们为此任务提出了一个新颖的时空图推理网络(STGRN)。首先,我们构建了一个时空区域图,以捕获与时间对象动力学的区域关系,该区域涉及每个帧中的隐式和显式空间子图以及跨帧的时间动态子图。然后,我们将文本线索纳入图中,并开发多步横模图推理。接下来,我们引入了一种时空定位器,采用动态选择方法来直接检索没有管前的时空管。此外,我们根据视频关系数据集Vidor贡献了一个大规模的视频接地数据集VIDSTG。广泛的实验证明了我们方法的有效性。

In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). Given an untrimmed video and a declarative/interrogative sentence depicting an object, STVG aims to localize the spatio-temporal tube of the queried object. STVG has two challenging settings: (1) We need to localize spatio-temporal object tubes from untrimmed videos, where the object may only exist in a very small segment of the video; (2) We deal with multi-form sentences, including the declarative sentences with explicit objects and interrogative sentences with unknown objects. Existing methods cannot tackle the STVG task due to the ineffective tube pre-generation and the lack of object relationship modeling. Thus, we then propose a novel Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build a spatio-temporal region graph to capture the region relationships with temporal object dynamics, which involves the implicit and explicit spatial subgraphs in each frame and the temporal dynamic subgraph across frames. We then incorporate textual clues into the graph and develop the multi-step cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer with a dynamic selection method to directly retrieve the spatio-temporal tubes without tube pre-generation. Moreover, we contribute a large-scale video grounding dataset VidSTG based on video relation dataset VidOR. The extensive experiments demonstrate the effectiveness of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源