论文标题
部分可观测时空混沌系统的无模型预测
Feature-Aligned Video Raindrop Removal with Temporal Constraints
论文作者
论文摘要
现有的固定雨滴去除方法集中于检测雨滴位置,然后使用介入技术或生成网络来恢复雨滴背后的背景。然而,随着贴有雨滴的尺寸和外观各不相同,对于单图像和视频,检测都具有挑战性。此外,与雨条不同,粘附的雨滴倾向于在几个框架中覆盖同一区域。在解决这些问题时,我们的方法采用了两阶段的基于视频的雨滴去除方法。第一阶段是单图模块,该模块生成初始清洁结果。第二阶段是多个帧模块,该模块使用时间约束进一步完善了初始结果,即通过在我们的过程中利用多个输入帧并在相邻输出帧之间应用时间一致性。我们的单个图像模块采用雨滴拆卸网络来生成初始的雨滴去除结果,并创建一个表示输入和初始输出之间差异的掩码。一旦获得了连续帧的掩码和初始结果,我们的多框模块将在图像和特征级别上对齐框架,然后获得干净的背景。我们的方法最初采用光流来对齐框架,然后进一步利用可变形的卷积层来实现特征级别的框架对齐。为了删除小雨滴并恢复正确的背景,可以从相邻框架中预测目标框架。提出了一系列无监督的损失,以便我们的第二阶段(即视频雨滴清除模块)可以从视频数据中自学,而无需地面真相。真实视频的实验结果证明了我们方法的最新性能,既有定量和定性。
Existing adherent raindrop removal methods focus on the detection of the raindrop locations, and then use inpainting techniques or generative networks to recover the background behind raindrops. Yet, as adherent raindrops are diverse in sizes and appearances, the detection is challenging for both single image and video. Moreover, unlike rain streaks, adherent raindrops tend to cover the same area in several frames. Addressing these problems, our method employs a two-stage video-based raindrop removal method. The first stage is the single image module, which generates initial clean results. The second stage is the multiple frame module, which further refines the initial results using temporal constraints, namely, by utilizing multiple input frames in our process and applying temporal consistency between adjacent output frames. Our single image module employs a raindrop removal network to generate initial raindrop removal results, and create a mask representing the differences between the input and initial output. Once the masks and initial results for consecutive frames are obtained, our multiple-frame module aligns the frames in both the image and feature levels and then obtains the clean background. Our method initially employs optical flow to align the frames, and then utilizes deformable convolution layers further to achieve feature-level frame alignment. To remove small raindrops and recover correct backgrounds, a target frame is predicted from adjacent frames. A series of unsupervised losses are proposed so that our second stage, which is the video raindrop removal module, can self-learn from video data without ground truths. Experimental results on real videos demonstrate the state-of-art performance of our method both quantitatively and qualitatively.