论文标题
通过混合数据集培训对野外视频的统一质量评估
Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training
论文作者
论文摘要
视频质量评估(VQA)是计算机视觉中的重要问题。计算机视觉应用程序中的视频通常在野外捕获。我们专注于自动评估野外视频的质量,这是一个具有挑战性的问题,因为缺乏参考视频,扭曲的复杂性和视频内容的多样性。此外,现有数据集之间的视频内容和扭曲是完全不同的,这导致跨数据库评估设置中数据驱动方法的性能差。为了提高质量评估模型的性能,我们从人类感知中借用直觉,特别是内容依赖性和人类视觉系统的时间内存效应。为了面对跨数据库评估挑战,我们探讨了一种混合数据集培训策略,用于培训具有多个数据集的单个VQA模型。拟议的统一框架明确包括三个阶段:相对质量评估者,非线性映射和数据集特定的感知量表对齐,以共同预测相对质量,感知质量和主观质量。实验是在野外VQA的四个公开可用数据集上进行的,即Live-VQC,Live-Qualcomm,Konvid-1K和CVD2014。实验结果验证了混合数据集训练策略的有效性,并证明了与最先进的模型相比,统一模型的出色性能。对于可复制的研究,我们在https://github.com/lidq92/mdtvsfa上提供了我们方法的Pytorch实现。
Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA.