论文标题
第三次是魅力?图像和视频编辑使用stylegan3
Third Time's the Charm? Image and Video Editing with StyleGAN3
论文作者
论文摘要
Stylegan可以说是最有趣,最有研究的生成模型之一,在图像产生,反转和操纵中表现出令人印象深刻的性能。在这项工作中,我们探讨了最近的StyleGAN架构,将其与前身进行比较,并研究其独特的优势以及缺点。特别是,我们证明,虽然可以在不对齐的数据上对StyleGan3进行培训,但仍然可以使用对齐数据进行训练,而不会阻碍生成未对齐的图像的能力。接下来,我们对stylegan不同潜在空间的分离的分析表明,常用的w/w+空间比其stylegan2对应物更纠缠,这强调了使用样式空间进行精细元素编辑的好处。考虑到图像倒置,我们观察到在未对准的数据进行培训时,现有的基于编码器的技术很难。因此,我们提出了一个仅根据对齐数据进行训练的编码方案,但仍然可以反转未对齐的图像。最后,我们介绍了一个新颖的视频反演和编辑工作流,该倒数利用微调的stylegan3发电机的功能来减少质地粘附并扩展了编辑的视频的视野。
StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery. Next, our analysis of the disentanglement of the different latent spaces of StyleGAN3 indicates that the commonly used W/W+ spaces are more entangled than their StyleGAN2 counterparts, underscoring the benefits of using the StyleSpace for fine-grained editing. Considering image inversion, we observe that existing encoder-based techniques struggle when trained on unaligned data. We therefore propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images. Finally, we introduce a novel video inversion and editing workflow that leverages the capabilities of a fine-tuned StyleGAN3 generator to reduce texture sticking and expand the field of view of the edited video.