论文标题
NUWA-INFINITY:无限视觉综合自动回归产生的自回归
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
论文作者
论文摘要
在本文中,我们提出了Nuwa-Infinity,这是一种无限视觉合成的生成模型,该模型被定义为生成任意尺寸的高分辨率图像或长期视频的任务。提出了自回旋的自回旋生成机制来处理这项可变大小的生成任务,其中全局贴片级自回归模型考虑了补丁之间的依赖性,以及局部令牌级自动回收式模型的依赖性依赖性依赖性。将附近的上下文池(NCP)引入已生成的与缓存相关的补丁,作为当前补丁正在生成的上下文,这可以在不牺牲补丁级依赖性建模的情况下大大节省计算成本。任意方向控制器(ADC)用于决定不同的视觉合成任务的合适生成订单,并学习订单感知的位置嵌入。与DALL-E,Imagen和Parti相比,NUWA-Infinity可以生成具有任意大小的高分辨率图像,并支持长期视频的生成。与NUWA(也涵盖图像和视频)相比,NUWA-Infinity在分辨率和可变尺寸的生成方面具有出色的视觉合成能力。 github链接是https://github.com/microsoft/nuwa。主页链接是https://nuwa-infinity.microsoft.com。
In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.