YouTubevos挑战2022的第一名解决方案：引用视频对象细分

论文标题

YouTubevos挑战2022的第一名解决方案：引用视频对象细分

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

论文作者

Hu, Zhiwei, Chen, Bo, Gao, Yuan, Ji, Zhilong, Bai, Jinfeng

论文摘要

引用视频对象分割的任务旨在将引用表达式引用的给定视频框架中的对象分割。以前的方法采用多阶段方法和设计复杂的管道来获得有希望的结果。最近，基于变压器的端到端方法证明了其优越性。在这项工作中，我们利用上述方法的优势，为RVO提供简单有效的管道。首先，我们改进了最新的一阶段方法推荐子，以获得与语言描述密切相关的掩模序列。其次，基于可靠且高质量的钥匙扣，我们利用视频对象分割模型的出色性能进一步提高了掩模结果的质量和时间一致性。我们的单个型号在引用YouTube-VOS验证集上达到70.3 J＆F，测试集的63.0达到了63.0。合奏结束后，我们在最终排行榜上取得了64.1的成绩，在CVPR2022上排名第一，参考YouTube-Vos挑战。代码将在https://github.com/zhiweihhh/cvpr2022-rvos-challenge.git上找到。

The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题