论文标题
长期视频框架插值通过特征传播
Long-term Video Frame Interpolation via Feature Propagation
论文作者
论文摘要
视频框架插值(VFI)的工作通常通过首先估算输入之间的运动,然后通过估计的运动将输入转移到目标时间。但是,当输入序列之间的时间距离增加时,由于现有运动估计模块无法有效处理大型动作,这种方法并不是最佳的。因此,VFI在较小的框架间隙方面表现良好,并且随着框架差距的增加而表现不佳。在这项工作中,我们提出了一个新的框架来解决这个问题。我们认为,当输入之间存在很大的差距,而不是估计不准确的运动最终会导致插值不准确时,我们可以将输入的一侧安全地传播到使用其他输入作为参考的可靠时间范围。然后,随着时间间隙的范围,可以使用标准方法插值将其余的中间框架插值。为此,我们通过采用新颖的运动对方法扩展经典的特征级预测来提出一个传播网络(PNET)。要彻底,我们将采用一个简单的插值模型以及PNET作为我们的完整模型,并设计了一个简单的过程,以端到端的方式训练完整模型。与最新方法相比,几个基准数据集的实验结果证实了我们长期VFI方法的有效性。
Video frame interpolation (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion. This approach, however, is not optimal when the temporal distance between the input sequence increases as existing motion estimation modules cannot effectively handle large motions. Hence, VFI works perform well for small frame gaps and perform poorly as the frame gap increases. In this work, we propose a novel framework to address this problem. We argue that when there is a large gap between inputs, instead of estimating imprecise motion that will eventually lead to inaccurate interpolation, we can safely propagate from one side of the input up to a reliable time frame using the other input as a reference. Then, the rest of the intermediate frames can be interpolated using standard approaches as the temporal gap is now narrowed. To this end, we propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach. To be thorough, we adopt a simple interpolation model along with PNet as our full model and design a simple procedure to train the full model in an end-to-end manner. Experimental results on several benchmark datasets confirm the effectiveness of our method for long-term VFI compared to state-of-the-art approaches.