论文标题
视频框架插值的感知质量指标
A Perceptual Quality Metric for Video Frame Interpolation
论文作者
论文摘要
近年来,对视频框架插值的研究取得了重大进展。但是,现有方法主要使用现成的指标来衡量插值结果的质量,除了一些采用用户研究的方法,这是耗时的。由于视频框架插值结果通常显示出独特的人工制品,因此在测量插值结果时,现有的质量指标有时与人类的感知不一致。最近一些基于深度学习的感知质量指标与人类判断更一致,但是它们在视频上的表现却被妥协了,因为它们不考虑时间信息。在本文中,我们提出了一个专门的感知质量指标,用于测量视频框架插值结果。我们的方法直接从视频而不是单个帧中学习感知功能。它比较了从视频帧中提取的金字塔特征,并采用了基于Swin Transformer块的时空模块来提取时空信息。为了训练我们的指标,我们收集了一个新的视频框架插值质量评估数据集。我们的实验表明,在测量视频框架插值结果时,我们专用的质量度量标准优于最先进的方法。我们的代码和模型可在\ url {https://github.com/hqqxyy/vfips}上公开提供。
Research on video frame interpolation has made significant progress in recent years. However, existing methods mostly use off-the-shelf metrics to measure the quality of interpolation results with the exception of a few methods that employ user studies, which is time-consuming. As video frame interpolation results often exhibit unique artifacts, existing quality metrics sometimes are not consistent with human perception when measuring the interpolation results. Some recent deep learning-based perceptual quality metrics are shown more consistent with human judgments, but their performance on videos is compromised since they do not consider temporal information. In this paper, we present a dedicated perceptual quality metric for measuring video frame interpolation results. Our method learns perceptual features directly from videos instead of individual frames. It compares pyramid features extracted from video frames and employs Swin Transformer blocks-based spatio-temporal modules to extract spatio-temporal information. To train our metric, we collected a new video frame interpolation quality assessment dataset. Our experiments show that our dedicated quality metric outperforms state-of-the-art methods when measuring video frame interpolation results. Our code and model are made publicly available at \url{https://github.com/hqqxyy/VFIPS}.