通过学习多层2.5D运动场，无监督的视频插值

论文标题

通过学习多层2.5D运动场，无监督的视频插值

Unsupervised Video Interpolation by Learning Multilayered 2.5D Motion Fields

论文作者

Cheng, Ziang, Jiang, Shihao, Li, Hongdong

论文摘要

视频框架插值的问题是通过插值现有的时间稀疏框架之间的新颖框架来增加低框架速率视频的时间分辨率。本文提出了一种自我监督的视频框架插值方法，仅需要一个视频。我们将视频作为一组层。每个层都由两个隐式神经网络参数化 - 一个用于学习静态帧，另一个用于与视频动力学相对应的时变运动场。它们一起用伪深度通道代表场景的无遮挡子集。为了建模层间闭塞，将所有层都提升为2.5D空间，以使额叶遮挡远处的层。这是通过分配每个图层的深度通道来完成的，我们称之为“伪深度”，其部分顺序定义了层之间的遮挡。伪深度通过完全可区分的软函数转换为可见性值，以使距离比距离内的图层更可见。另一方面，我们通过求解在随时间变化的神经速度字段上定义的普通可区分方程（ODE）来参数化视频动作，该方程（保证了有效运动）。这种隐式神经表示将视频学习为时空连续性，从而可以在任何时间分辨率下进行插值。我们证明了我们的方法在现实世界数据集上的有效性，在这些数据集中，我们的方法与需要培训的地面真相标签的最先进的性能可比性。

The problem of video frame interpolation is to increase the temporal resolution of a low frame-rate video, by interpolating novel frames between existing temporally sparse frames. This paper presents a self-supervised approach to video frame interpolation that requires only a single video. We pose the video as a set of layers. Each layer is parameterized by two implicit neural networks -- one for learning a static frame and the other for a time-varying motion field corresponding to video dynamics. Together they represent an occlusion-free subset of the scene with a pseudo-depth channel. To model inter-layer occlusions, all layers are lifted to the 2.5D space so that the frontal layer occludes distant layers. This is done by assigning each layer a depth channel, which we call `pseudo-depth', whose partial order defines the occlusion between layers. The pseudo-depths are converted to visibility values through a fully differentiable SoftMin function so that closer layers are more visible than layers in a distance. On the other hand, we parameterize the video motions by solving an ordinary differentiable equation (ODE) defined on a time-varying neural velocity field that guarantees valid motions. This implicit neural representation learns the video as a space-time continuum, allowing frame interpolation at any temporal resolution. We demonstrate the effectiveness of our method on real-world datasets, where our method achieves comparable performance to state-of-the-arts that require ground truth labels for training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题