通过代码共享的高质量多元图像完成

论文标题

通过代码共享的高质量多元图像完成

High-Quality Pluralistic Image Completion via Code Shared VQGAN

论文作者

Zheng, Chuanxia, Song, Guoxian, Cham, Tat-Jen, Cai, Jianfei, Phung, Dinh, Luo, Linjie

论文摘要

野餐率开创了图像完成任务的多个和多样的结果的生成，但是它需要在$ \ Mathcal {kl} $损失（多样性）和重建损失（质量）之间取得仔细的平衡，从而产生有限的多样性和质量。另外，已经采用了基于IGPT的体系结构来推断从像素级预簇的调色板得出的离散空间中，但是该调色板无法直接产生高质量的结果。在这项工作中，我们提出了一个新颖的多元图像完成框架，可以以更快的推理速度达到高质量和多样性。我们设计的核心在于一种简单而有效的代码共享机制，该机制在离散的潜在域中导致非常紧凑而表达的图像表示。表示形式的紧凑性和丰富性进一步促进了变压器的随后部署，以有效地学习如何在离散代码域中复合和完成蒙版的图像。基于变压器和可用视觉区域的全球上下文捕获，我们能够同时对所有令牌进行采样，这与基于IGPT的作品的主要自动回归方法完全不同，并导致超过100 $ \ times $ \ times $ $ \ times $ $更快。实验表明，我们的框架能够有效，健壮地学习语义丰富的离散代码，从而获得更好的图像重建质量。我们多样化的图像完成框架在多个基准数据集上大大优于最先进的框架。

PICNet pioneered the generation of multiple and diverse results for image completion task, but it required a careful balance between $\mathcal{KL}$ loss (diversity) and reconstruction loss (quality), resulting in a limited diversity and quality . Separately, iGPT-based architecture has been employed to infer distributions in a discrete space derived from a pixel-level pre-clustered palette, which however cannot generate high-quality results directly. In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed. The core of our design lies in a simple yet effective code sharing mechanism that leads to a very compact yet expressive image representation in a discrete latent domain. The compactness and the richness of the representation further facilitate the subsequent deployment of a transformer to effectively learn how to composite and complete a masked image at the discrete code domain. Based on the global context well-captured by the transformer and the available visual regions, we are able to sample all tokens simultaneously, which is completely different from the prevailing autoregressive approach of iGPT-based works, and leads to more than 100$\times$ faster inference speed. Experiments show that our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality. Our diverse image completion framework significantly outperforms the state-of-the-art both quantitatively and qualitatively on multiple benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题