Nüwa-lip：语言指导的图像与无缺陷的VQGAN覆盖

论文标题

Nüwa-lip：语言指导的图像与无缺陷的VQGAN覆盖

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

论文作者

Ni, Minheng, Wu, Chenfei, Huang, Haoyang, Jiang, Daxin, Zuo, Wangmeng, Duan, Nan

论文摘要

语言指导的图像介绍旨在在文本的指导下填补图像的有缺陷区域，同时保持非缺陷区域不变。但是，现有模型的编码过程遭受了有缺陷区域的接受性传播或无缺陷区域的信息丢失，从而导致视觉上没有吸引人的介绍结果。为了解决上述问题，本文提出了Nüwa-lip，通过将无缺陷的VQGAN（DF-VQGAN）与序列（MP-S2S）的多角度序列结合在一起。特别是，DF-VQGAN引入了相对估计，以控制接受性扩散并采用对称连接以保护信息。 MP-S2从互补的角度（包括低级像素和高级令牌）进一步增强了视觉信息。实验表明，df-vqgan的性能比vqgan更健壮。为了评估模型的覆盖性能，我们建立了3个开放域基准测试，其中Nüwa-Lip也比最近的强大基线优越。

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of defective regions or information loss of non-defective regions, giving rise to visually unappealing inpainting results. To address the above issues, this paper proposes NÜWA-LIP by incorporating defect-free VQGAN (DF-VQGAN) with multi-perspective sequence to sequence (MP-S2S). In particular, DF-VQGAN introduces relative estimation to control receptive spreading and adopts symmetrical connections to protect information. MP-S2S further enhances visual information from complementary perspectives, including both low-level pixels and high-level tokens. Experiments show that DF-VQGAN performs more robustness than VQGAN. To evaluate the inpainting performance of our model, we built up 3 open-domain benchmarks, where NÜWA-LIP is also superior to recent strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题