论文标题
基于按键的基于音频驱动的自由视图说话头合成的增强方法
A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
论文作者
论文摘要
音频驱动的会说话的头合成是一项具有挑战性的任务,近年来引起越来越多的关注。尽管基于2D地标或3D面模型的现有方法可以合成准确的嘴唇同步和有节奏的头部姿势以进行任意身份,但它们仍然存在局限性,例如口腔映射中的切割感觉和缺乏皮肤亮点。与周围的脸相比,变形区域模糊。为音频驱动的自由视图说话头综合,提出了基于按键的增强(KPBE)方法,以改善生成的视频的自然性。首先,现有方法被用作综合中间结果的后端。然后,我们使用按键点分解从后端输出和源图像提取视频综合参数。之后,将控制参数组合到源关键点和驾驶关键点。使用基于运动场的方法来从关键点表示中生成最终图像。通过按键器表示,我们克服了嘴映射中的切割感觉和缺乏皮肤的亮点。实验表明,我们提出的增强方法从平均意见分数方面提高了说话头视频的质量。
Audio driven talking head synthesis is a challenging task that attracts increasing attention in recent years. Although existing methods based on 2D landmarks or 3D face models can synthesize accurate lip synchronization and rhythmic head pose for arbitrary identity, they still have limitations, such as the cut feeling in the mouth mapping and the lack of skin highlights. The morphed region is blurry compared to the surrounding face. A Keypoint Based Enhancement (KPBE) method is proposed for audio driven free view talking head synthesis to improve the naturalness of the generated video. Firstly, existing methods were used as the backend to synthesize intermediate results. Then we used keypoint decomposition to extract video synthesis controlling parameters from the backend output and the source image. After that, the controlling parameters were composited to the source keypoints and the driving keypoints. A motion field based method was used to generate the final image from the keypoint representation. With keypoint representation, we overcame the cut feeling in the mouth mapping and the lack of skin highlights. Experiments show that our proposed enhancement method improved the quality of talking-head videos in terms of mean opinion score.