论文标题
SATVSR:跨场景的场景自适应变压器视频超分辨率
SATVSR: Scenario Adaptive Transformer for Cross Scenarios Video Super-Resolution
论文作者
论文摘要
视频超分辨率(VSR)旨在从低分辨率(LR)帧中恢复高分辨率(HR)帧序列。先前的方法主要利用时间相邻的框架来帮助重建目标帧。但是,在现实世界中,相邻的视频帧中有很多不相关的信息,这些视频帧具有快速场景切换,这些VSR方法无法自适应区分并选择有用的信息。相比之下,使用适合时间任务的变压器结构,我们设计了一种新型的自适应场景视频超分辨率方法。具体来说,我们使用光流来标记每个视频框架中的补丁,仅计算具有同一标签的贴片的注意力。然后选择其中最相关的标签,以补充目标框架的时空信息。这种设计可以直接使补充信息尽可能地来自同一场景。我们进一步提出了一个跨尺度特征聚合模块,以更好地处理尺度变化问题。与其他视频超分辨率方法相比,我们的方法不仅可以在单场景视频上获得显着的性能提高,而且在跨场所数据集上具有更好的鲁棒性。
Video Super-Resolution (VSR) aims to recover sequences of high-resolution (HR) frames from low-resolution (LR) frames. Previous methods mainly utilize temporally adjacent frames to assist the reconstruction of target frames. However, in the real world, there is a lot of irrelevant information in adjacent frames of videos with fast scene switching, these VSR methods cannot adaptively distinguish and select useful information. In contrast, with a transformer structure suitable for temporal tasks, we devise a novel adaptive scenario video super-resolution method. Specifically, we use optical flow to label the patches in each video frame, only calculate the attention of patches with the same label. Then select the most relevant label among them to supplement the spatial-temporal information of the target frame. This design can directly make the supplementary information come from the same scene as much as possible. We further propose a cross-scale feature aggregation module to better handle the scale variation problem. Compared with other video super-resolution methods, our method not only achieves significant performance gains on single-scene videos but also has better robustness on cross-scene datasets.