大规模数据基于转换的对抗视频预测

论文标题

大规模数据基于转换的对抗视频预测

Transformation-based Adversarial Video Prediction on Large-Scale Data

论文作者

Luc, Pauline, Clark, Aidan, Dieleman, Sander, Casas, Diego de Las, Doron, Yotam, Cassirer, Albin, Simonyan, Karen

论文摘要

对抗性生成建模的最新突破导致了能够生成高质量视频样本的模型，即使在真实世界的大型数据集上也是如此。在这项工作中，我们专注于视频预测的任务，从视频中提取了一系列帧，目标是生成合理的未来序列。我们首先通过对歧视因子分解的系统实证研究来改善艺术的状态，并提出与以前的方法相比，构建结构可产生更快的收敛性和更高的性能。然后，我们分析发电机中的经常性单元，并提出了一个新型的复发单元，该单元根据预测的类似运动的特征将其过去的隐藏状态转换，并完善其以处理脉冲，场景变化和其他复杂行为。我们表明，这种经常性的单元始终优于以前的设计。我们的最终模型导致了最新性能的飞跃，从大规模动力学-600数据集中获得了测试集的Frechet视频距离，低于69.2。

Recent breakthroughs in adversarial generative modeling have led to models capable of producing video samples of high quality, even on large and complex datasets of real-world video. In this work, we focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence. We first improve the state of the art by performing a systematic empirical study of discriminator decompositions and proposing an architecture that yields faster convergence and higher performance than previous approaches. We then analyze recurrent units in the generator, and propose a novel recurrent unit which transforms its past hidden state according to predicted motion-like features, and refines it to handle dis-occlusions, scene changes and other complex behavior. We show that this recurrent unit consistently outperforms previous designs. Our final model leads to a leap in the state-of-the-art performance, obtaining a test set Frechet Video Distance of 25.7, down from 69.2, on the large-scale Kinetics-600 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题