视频插值的跨意义变压器

论文标题

视频插值的跨意义变压器

Cross-Attention Transformer for Video Interpolation

论文作者

Kim, Hannah Halin, Yu, Shuzhi, Yuan, Shuai, Tomasi, Carlo

论文摘要

我们提出了tain（视频插值的变压器和注意力），这是一种用于视频插值的残留神经网络，旨在插入中间框架，并在其周围连续两个图像框架下进行插值。我们首先提出了一个新型视觉变压器模块，称为Cross相似性（CS），以与具有与预测的插值框架相似的全球骨料输入图像特征。然后，这些CS特征用于完善插值预测。为了说明CS功能中的遮挡，我们提出了一个图像注意（IA）模块，以使网络可以从另一个框架上关注CS功能。胜过不需要流量估计并与基于流的方法相当执行的现有方法，同时在VIMEO90K，UCF101和SNU-FILM基准的推理时间上具有计算有效的效率。

We propose TAIN (Transformers and Attention for video INterpolation), a residual neural network for video interpolation, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel vision transformer module, named Cross Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted interpolated frame. These CS features are then used to refine the interpolated prediction. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. TAIN outperforms existing methods that do not require flow estimation and performs comparably to flow-based methods while being computationally efficient in terms of inference time on Vimeo90k, UCF101, and SNU-FILM benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题