论文标题
嘴唇同步专家是您需要在野外发作的语音所需的一切
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
论文作者
论文摘要
在这项工作中,我们调查了唇部同步的说话面孔视频的问题,以匹配目标语音段的任意身份。当前的作品擅长于在训练阶段看到的静态图像或特定人员的视频中产生准确的唇部运动。但是,他们无法准确地在动态,不受限制的说话面部视频中准确变形任意身份的唇部运动,从而导致视频的重要部分与新音频相同。我们确定了与此有关的关键原因,因此通过向强大的Lip-同步歧视者学习来解决它们。接下来,我们提出了新的严格评估基准和指标,以准确测量无约束视频中的唇同步。对我们具有挑战性的基准测试的广泛定量评估表明,我们的Wav2LIP模型产生的视频的LIP-同步精度几乎与真实同步视频一样好。我们提供了一个演示视频,清楚地显示了我们的网站上的Wav2LIP模型和评估基准的实质影响:\ url {cvit.iiit.ac.in/research/project/project/projects/cvit-projects/a-lip-sync-sync-sync-sync-ip-sync-isexpert-is-sexpert-is-sexpert-is-as--all-all-al-need-need-for-for-for-for-for-for-for-speh--------------------------in-in-the-wild}。代码和模型在此github存储库中发布:\ url {github.com/rudrabha/wav2lip}。您也可以在此链接上尝试交互式演示:\ url {bhaasha.iiit.ac.in/lipsync}。
In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase. However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio. We identify key reasons pertaining to this and hence resolve them by learning from a powerful lip-sync discriminator. Next, we propose new, rigorous evaluation benchmarks and metrics to accurately measure lip synchronization in unconstrained videos. Extensive quantitative evaluations on our challenging benchmarks show that the lip-sync accuracy of the videos generated by our Wav2Lip model is almost as good as real synced videos. We provide a demo video clearly showing the substantial impact of our Wav2Lip model and evaluation benchmarks on our website: \url{cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild}. The code and models are released at this GitHub repository: \url{github.com/Rudrabha/Wav2Lip}. You can also try out the interactive demo at this link: \url{bhaasha.iiit.ac.in/lipsync}.