Transframer：使用生成模型的任意框架预测

论文标题

Transframer：使用生成模型的任意框架预测

Transframer: Arbitrary Frame Prediction with Generative Models

论文作者

Nash, Charlie, Carreira, João, Walker, Jacob, Barr, Iain, Jaegle, Andrew, Malinowski, Mateusz, Battaglia, Peter

论文摘要

我们提出了一个通用框架，用于基于概率框架预测的图像建模和视觉任务。我们的方法统一了从图像分割到新颖的视图综合和视频插值的广泛任务。我们将此框架与我们称为Transframer的体系结构，该体系结构使用U-NET和Transformer组件在注释的上下文帧上条件，并输出稀疏，压缩图像特征的序列。 Transframer是各种视频生成基准测试的最先进，它具有几乎没有镜头综合的最强模型的竞争，并且可以从单个图像中生成连贯的30秒视频，而无需任何明确的几何信息。单一的通才转移器同时在8个任务上产生有希望的结果，包括语义分割，图像分类和光流预测，而没有任何特定任务的架构组件，表明可以使用概率图像模型来解决多任务计算机视觉。我们的方法原则上可以应用于需要学习带有带图的图像形式数据的条件结构的广泛应用。

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题