论文标题
stylegan2
Text-to-Face Generation with StyleGAN2
论文作者
论文摘要
随着生成对抗网络的出现,文本描述中的综合图像已成为一个活跃的研究领域。这里的主要目标是生成与输入说明对齐的照片真实图像。文本到面的生成(T2F)是文本对图(T2I)的子域,由于面部属性的复杂性和变化,它更具挑战性。它主要在公共安全领域中有许多应用程序。即使有多种型号可用于T2F,仍然需要提高图像质量和语义对齐方式。在这项研究中,我们提出了一个新颖的框架,以生成与输入描述齐全的面部图像。我们的框架利用了高分辨率的面部生成器Stylegan2,并探讨了在T2F中使用它的可能性。在这里,我们使用bert嵌入将文本嵌入了stylegan2的输入潜在空间中,并使用文本说明监督了面部图像的生成。我们培训了基于属性的描述的框架,以生成分辨率的1024x1024的图像。产生的图像与地面真相图像的相似性为57%,面部语义距离为0.92,表现优于最新作品。生成的图像的FID得分为118.097,实验结果表明我们的模型生成了有希望的图像。
Synthesizing images from text descriptions has become an active research area with the advent of Generative Adversarial Networks. The main goal here is to generate photo-realistic images that are aligned with the input descriptions. Text-to-Face generation (T2F) is a sub-domain of Text-to-Image generation (T2I) that is more challenging due to the complexity and variation of facial attributes. It has a number of applications mainly in the domain of public safety. Even though several models are available for T2F, there is still the need to improve the image quality and the semantic alignment. In this research, we propose a novel framework, to generate facial images that are well-aligned with the input descriptions. Our framework utilizes the high-resolution face generator, StyleGAN2, and explores the possibility of using it in T2F. Here, we embed text in the input latent space of StyleGAN2 using BERT embeddings and oversee the generation of facial images using text descriptions. We trained our framework on attribute-based descriptions to generate images of 1024x1024 in resolution. The images generated exhibit a 57% similarity to the ground truth images, with a face semantic distance of 0.92, outperforming state-of-the-artwork. The generated images have a FID score of 118.097 and the experimental results show that our model generates promising images.