骗局！通过语义交叉注意调制在图像之间转移人类

论文标题

骗局！通过语义交叉注意调制在图像之间转移人类

SCAM! Transferring humans between images with Semantic Cross Attention Modulation

论文作者

Dufour, Nicolas, Picard, David, Kalogeiton, Vicky

论文摘要

大量最近的作品针对语义条件的图像产生。大多数这样的方法都集中在姿势转移的狭窄任务上，而忽略了主题转移的更具挑战性的任务，该任务不仅包括转移姿势，而且还包括外观和背景。在这项工作中，我们引入了骗局（语义交叉注意调制），该系统在图像的每个语义区域（包括前景和背景）中编码丰富而多样的信息，从而实现精确的一代，重点是细节。这是通过语义注意变压器编码器来启用的，该编码器为每个语义区域提取多个潜在向量，以及通过使用语义交叉注意调制来利用这些多重潜在的相应发电机。它仅使用重建设置进行训练，而在测试时间进行主题转移。我们的分析表明，我们提出的架构成功地编码了每个语义区域的外观多样性。在IDesigner和Celebamask-HD数据集上进行的广泛实验表明，骗局的表现优于Sean和Spade。此外，它为主题转移设定了新的艺术状态。

A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题