论文标题
具有人均推理的有效视频分割模型
Efficient Video Segmentation Models with Per-frame Inference
论文作者
论文摘要
在视频序列测试时,大多数对每个框架进行训练的实时深层模型可能会在颞轴上产生不一致的结果。一些方法通过使用光流或使用多帧信息提取框架表示的结果来考虑视频序列中的相关性,例如,可能导致结果不准确或延迟不平衡。在这项工作中,我们专注于提高时间一致性,而无需在推理中引入计算开销。为此,我们在每个帧中执行推断。通过在训练阶段的视频框架中学习和额外的限制,可以实现时间一致性。引入推理。我们建议从视频序列中学习几种技术,包括时间一致性损失和在线/离线知识蒸馏方法。关于语义视频细分的任务,在准确性,时间平滑度和效率之间权衡,我们提出的方法优于基于密钥帧的方法和一些基线方法,并在每个框架上都独立地在包括CityScapes,Camvid和300VW掩码的数据集中对每个框架进行了训练。我们进一步将培训方法应用于YouTubevisand的视频实例细分,通过对跨帧的时间一致的实例级级提示进行细分,开发了视频序列中的肖像贴图的应用。实验显示出较高的定性和定量结果。代码可在以下网址提供:https://git.io/vidseg。
Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into account,e.g., by propagating the results to the neighboring frames using optical flow or extracting frame representations using multi-frame information, which may lead to inaccurate results or unbalanced latency. In this work, we focus on improving the temporal consistency without introducing computation overhead in inference. To this end, we perform inference at each frame. Temporal consistency is achieved by learning from video frames with extra constraints during the training phase. introduced for inference. We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods. On the task of semantic video segmentation, weighing among accuracy, temporal smoothness, and efficiency, our proposed method outperforms keyframe-based methods and a few baseline methods that are trained with each frame independently, on datasets including Cityscapes, Camvid, and 300VW-Mask. We further apply our training method to video instance segmentation on YouTubeVISand develop an application of portrait matting in video sequences, by segmenting temporally consistent instance-level trimaps across frames. Experiments show superior qualitative and quantitative results. Code is available at: https://git.io/vidseg.