论文标题
动态神经纹理:生成具有连续可控表达式的说话面视频
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions
论文作者
论文摘要
最近,谈话面视频的产生引起了很多关注。到目前为止,大多数方法都会以中性表达式或表达式产生结果,这些表达式或表达式由神经网络以无法控制的方式隐式确定。在本文中,我们提出了一种实时产生具有连续可控表达式的说话面视频的方法。我们的方法基于一个重要的观察:与中等分辨率的面部几何形状相反,大多数表达信息在于纹理。然后,我们利用神经纹理来生成高质量的说话面部视频,并设计一种新型的神经网络,可以根据输入表达和连续强度表达式编码(CIEC)生成图像帧(我们称为动态神经纹理)的神经纹理(我们称为动态神经纹理)。我们的方法使用3DMM作为3D模型来采样动态神经纹理。 3DMM不覆盖牙齿区域,因此我们建议牙齿子模块以完成牙齿的细节。结果和消融研究表明,我们方法在生成具有连续可控表达式的高质量说话视频方面的有效性。我们还通过组合现有代表性方法并将其与我们的方法进行比较来设置四种基线方法。包括用户研究在内的实验结果表明,我们的方法具有最佳性能。
Recently, talking-face video generation has received considerable attention. So far most methods generate results with neutral expressions or expressions that are implicitly determined by neural networks in an uncontrollable way. In this paper, we propose a method to generate talking-face videos with continuously controllable expressions in real-time. Our method is based on an important observation: In contrast to facial geometry of moderate resolution, most expression information lies in textures. Then we make use of neural textures to generate high-quality talking face videos and design a novel neural network that can generate neural textures for image frames (which we called dynamic neural textures) based on the input expression and continuous intensity expression coding (CIEC). Our method uses 3DMM as a 3D model to sample the dynamic neural texture. The 3DMM does not cover the teeth area, so we propose a teeth submodule to complete the details in teeth. Results and an ablation study show the effectiveness of our method in generating high-quality talking-face videos with continuously controllable expressions. We also set up four baseline methods by combining existing representative methods and compare them with our method. Experimental results including a user study show that our method has the best performance.