论文标题
通过基于阶段的运动分析的可推广的深层检测
Generalizable Deepfake Detection with Phase-Based Motion Analysis
论文作者
论文摘要
我们提出了相位传真,这是一种深层(DF)视频检测方法,它利用了面部时间动力学的基于阶段的运动表示。依靠时间不一致的DF检测的现有方法比典型的基于框架的方法具有许多优势。但是,它们仍然显示出对常见扭曲的跨数据集概括和鲁棒性。这些缺点部分是由于容易出错的运动估计和地标跟踪,或基于像素强度的特征对空间变形和跨数据库域移动的敏感性。克服这些问题的我们的主要见解是利用面部子区域上复杂的可触摸金字塔的频段分量的时间相变。这不仅可以对这些区域中的时间动力学进行稳健的估计,而且还不太容易发生跨数据库变化。此外,用于计算局部人均阶段的频道滤波器形成了针对基于梯度的对抗攻击中常见的扰动的有效防御。总体而言,通过相位传真,我们表现出改善的失真和对抗性鲁棒性以及最新的跨数据集泛化,对挑战性的CelebDFV2的视频级别为91.2%(最近的最新比较的比较为86.9%)。
We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of facial temporal dynamics. Existing methods relying on temporal inconsistencies for DF detection present many advantages over the typical frame-based methods. However, they still show limited cross-dataset generalization and robustness to common distortions. These shortcomings are partially due to error-prone motion estimation and landmark tracking, or the susceptibility of the pixel intensity-based features to spatial distortions and the cross-dataset domain shifts. Our key insight to overcome these issues is to leverage the temporal phase variations in the band-pass components of the Complex Steerable Pyramid on face sub-regions. This not only enables a robust estimate of the temporal dynamics in these regions, but is also less prone to cross-dataset variations. Furthermore, the band-pass filters used to compute the local per-frame phase form an effective defense against the perturbations commonly seen in gradient-based adversarial attacks. Overall, with PhaseForensics, we show improved distortion and adversarial robustness, and state-of-the-art cross-dataset generalization, with 91.2% video-level AUC on the challenging CelebDFv2 (a recent state-of-the-art compares at 86.9%).