视频对象分割的时空图形神经网络基于面膜重建

论文标题

视频对象分割的时空图形神经网络基于面膜重建

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

论文作者

Liu, Daizong, Xu, Shuangjie, Liu, Xiao-Yang, Xu, Zichuan, Wei, Wei, Zhou, Pan

论文摘要

本文解决了在半监督设置中分割类不足对象的任务。尽管以前基于检测的方法实现了相对较好的性能，但这些方法通过贪婪的策略提取了最佳建议，这可能会在所选候选人之外失去本地补丁的细节。在本文中，我们提出了一种新颖的时空图神经网络（STG-NET），以重建更准确的掩码以进行视频对象分割，该掩模通过利用所有建议来捕获本地上下文。在空间图中，我们将框架的对象提案视为节点，并代表其与掩模上下文聚合的边缘权重策略的相关性。为了捕获以前框架的时间信息，我们使用内存网络通过在时间图中检索历史性掩码来完善当前框架的面具。当地补丁细节和时间关系的共同使用使我们能够更好地应对诸如对象阻塞和缺失等挑战。在没有在线学习和微调的情况下，我们的STG-NET可以在四个大型基准（Davis，YouTube-Vos，Segtrack-V2和YouTube-Objects）上实现最先进的性能，这表明了所提出的方法的有效性。

This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting. Although previous detection based methods achieve relatively good performance, these approaches extract the best proposal by a greedy strategy, which may lose the local patch details outside the chosen candidate. In this paper, we propose a novel spatiotemporal graph neural network (STG-Net) to reconstruct more accurate masks for video object segmentation, which captures the local contexts by utilizing all proposals. In the spatial graph, we treat object proposals of a frame as nodes and represent their correlations with an edge weight strategy for mask context aggregation. To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph. The joint use of both local patch details and temporal relationships allow us to better address the challenges such as object occlusion and missing. Without online learning and fine-tuning, our STG-Net achieves state-of-the-art performance on four large benchmarks (DAVIS, YouTube-VOS, SegTrack-v2, and YouTube-Objects), demonstrating the effectiveness of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题