细粒度实例级基于草图的视频检索

论文标题

细粒度实例级基于草图的视频检索

Fine-Grained Instance-Level Sketch-Based Video Retrieval

论文作者

Xu, Peng, Liu, Kun, Xiang, Tao, Hospedales, Timothy M., Ma, Zhanyu, Guo, Jun, Song, Yi-Zhe

论文摘要

现有的素描分析工作研究素描描绘了静态对象或场景。在这项工作中，我们提出了一个新型的基于素描的视频检索（FG-SBVR）的新颖的跨模式检索问题，其中草图序列被用作查询以检索特定目标视频实例。与基于草图的静态图像检索和粗粒类别级视频检索相比，这更具挑战性，因为视觉外观和运动都需要在细粒度的水平上同时匹配。我们用丰富的注释贡献了第一个FG-SBVR数据集。然后，我们引入了一种新型的多流多模式深网，以在强和弱监督的设置下执行FG-SBVR。网络的关键组成部分是一个关系模块，旨在防止稀缺训练数据的模型过拟合。我们表明，该模型大大优于许多用于视频分析的现有最新模型。

Existing sketch-analysis work studies sketches depicting static objects or scenes. In this work, we propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR), where a sketch sequence is used as a query to retrieve a specific target video instance. Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level. We contribute the first FG-SBVR dataset with rich annotations. We then introduce a novel multi-stream multi-modality deep network to perform FG-SBVR under both strong and weakly supervised settings. The key component of the network is a relation module, designed to prevent model over-fitting given scarce training data. We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题