论文标题
长视频的灵活扩散建模
Flexible Diffusion Modeling of Long Videos
论文作者
论文摘要
我们为视频建模提供了一个框架,该框架基于deno的扩散概率模型,该模型在各种现实的环境中产生长期视频完成。我们介绍了一个生成模型,该模型可以在测试时间样本中任何任意子集的任何任意子集,该视频框架以其他任何子集为条件,并为此提供了适合此目的的体系结构。这样做可以使我们有效地比较和优化各种计划,以采样长视频中的框架,并在先前采样的框架上使用选择性稀疏和长距离调节。我们证明了在许多数据集上的先前工作中改进的视频建模,并在25分钟内进行了临时连贯的视频。我们还根据CARLA自主驾驶模拟器中生成的视频发布了一个新的视频建模数据集和语义意义的指标。
We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.