VFHQ：视频脸超分辨率的高质量数据集和基准

论文标题

VFHQ：视频脸超分辨率的高质量数据集和基准

VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution

论文作者

Wang, Liangbin Xie. Xintao, Zhang, Honglun, Dong, Chao, Shan, Ying

论文摘要

大多数现有的视频面部超分辨率（VFSR）方法都是在Voxceleb1上培训和评估的，该方法专门用于扬声器识别，该数据集中的框架质量低。结果，在此数据集上训练的VFSR模型无法输出视觉上的结果。在本文中，我们开发了一条自动且可扩展的管道，以收集高质量的视频面数据集（VFHQ），该数据集包含超过16,000美元的高保真面试场景的$ 16,000 $。为了验证VFHQ的必要性，我们进一步进行实验，并证明在我们的VFHQ数据集中训练的VFSR模型可以比在Voxceleb1上训练的VFHQ数据集生成带有更清晰的边缘和更精细纹理的结果。此外，我们表明，时间信息在消除视频一致性问题以及进一步改善视觉性能方面起着关键作用。基于VFHQ，通过分析对双学术和盲构设置下的几种最先进算法的基准测试研究。请参阅我们的项目页面：https：//liangbinxie.github.io/projects/vfhq

Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over $16,000$ high-fidelity clips of diverse interview scenarios. To verify the necessity of VFHQ, we further conduct experiments and demonstrate that VFSR models trained on our VFHQ dataset can generate results with sharper edges and finer textures than those trained on VoxCeleb1. In addition, we show that the temporal information plays a pivotal role in eliminating video consistency issues as well as further improving visual performance. Based on VFHQ, by analyzing the benchmarking study of several state-of-the-art algorithms under bicubic and blind settings. See our project page: https://liangbinxie.github.io/projects/vfhq

下载PDF全文

下载文献需遵守相关版权规定

论文标题