论文标题
视频对象细分的自适应内存管理
Adaptive Memory Management for Video Object Segmentation
论文作者
论文摘要
基于匹配的网络通过将每个K帧存储在外部内存库中以供将来推断,从而实现了视频对象细分(VOS)任务的最新性能。存储中间框架的预测为网络提供了更丰富的线索,可在当前帧中分割对象。但是,内存库的大小随着视频的长度逐渐增加,这会减慢推理速度,并使处理任意长度视频变得不切实际。 本文提出了一种自适应内存库策略,用于针对半监督视频对象细分(VO)的基于匹配的网络,该网络可以通过丢弃过时的功能来处理任意长度的视频。功能根据其在先前帧中对象的分割中的重要性进行索引。根据索引,我们丢弃了不重要的功能以适应新功能。我们介绍了在2016年戴维斯(Davis),戴维斯(Davis)2017和YouTube-VOS上进行的实验,这些实验表明,我们的方法优于采用固定尺寸的内存库的第一和最长期策略的最先进的策略,并实现了与Every-K策略的可比性能,并具有增加尺寸的内存库。此外,实验表明,我们的方法比每一个K的推理速度高达80%,而在最长期的策略中,推理速度高达80%。
Matching-based networks have achieved state-of-the-art performance for video object segmentation (VOS) tasks by storing every-k frames in an external memory bank for future inference. Storing the intermediate frames' predictions provides the network with richer cues for segmenting an object in the current frame. However, the size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos. This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features. Features are indexed based on their importance in the segmentation of the objects in previous frames. Based on the index, we discard unimportant features to accommodate new features. We present our experiments on DAVIS 2016, DAVIS 2017, and Youtube-VOS that demonstrate that our method outperforms state-of-the-art that employ first-and-latest strategy with fixed-sized memory banks and achieves comparable performance to the every-k strategy with increasing-sized memory banks. Furthermore, experiments show that our method increases inference speed by up to 80% over the every-k and 35% over first-and-latest strategies.