朝代：示例引导图像生成的动态稀疏变压器

论文标题

朝代：示例引导图像生成的动态稀疏变压器

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

论文作者

Liu, Songhua, Ye, Jingwen, Ren, Sucheng, Wang, Xinchao

论文摘要

示例引导图像生成的一个主要挑战在于在输入和引导图像之间建立细粒度的对应关系。尽管有希望的结果，但先前的方法还是依赖于估计密集的关注对计算的每点匹配，这仅限于由于二次记忆成本而导致的粗尺度，或者固定对应关系的数量以达到线性复杂性，这缺乏灵活性。在本文中，我们提出了一个动态稀疏的基于注意力的变压器模型，称为动态稀疏变压器（Dynast），以实现具有优惠效率的优质匹配。我们方法的核心是一个新颖的动态注意事项单元，致力于涵盖一个应着重于令牌的最佳数量的变化。具体而言，Dynast利用变压器结构的多层性质，并以级联的方式执行动态注意力方案，以优化匹配结果并合成视觉上令人愉悦的输出。此外，我们还为Dynast引入了一个统一的培训目标，使其成为监督和无监督场景的多功能基于参考的图像翻译框架。对三种应用，姿势引导的人形象产生，基于边缘的面部合成以及未置换的图像样式转移的广泛实验表明，朝代在本地细节中实现了卓越的性能，表现出色的状态，同时降低了计算成本。我们的代码可从https://github.com/huage001/dynast获得

One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a cascaded manner to refine matching results and synthesize visually-pleasing outputs. In addition, we introduce a unified training objective for DynaST, making it a versatile reference-based image translation framework for both supervised and unsupervised scenarios. Extensive experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details, outperforming the state of the art while reducing the computational cost significantly. Our code is available at https://github.com/Huage001/DynaST

下载PDF全文

下载文献需遵守相关版权规定

论文标题