在深层生成模型中获胜的彩票

论文标题

在深层生成模型中获胜的彩票

Winning Lottery Tickets in Deep Generative Models

论文作者

Kalibhat, Neha Mukund, Balaji, Yogesh, Feizi, Soheil

论文摘要

彩票票证假设表明，如果适当初始化的话，可以培训给定神经网络的稀疏子网，以达到与原始网络的可比性甚至更好的性能。彩票中的先前作品主要集中在监督的学习设置上，几篇论文提出了在分类问题中查找“获胜门票”的有效方法。在本文中，我们证实了在诸如gan和vaes之类的深层生成模型中赢得门票的存在。我们表明，流行的迭代幅度修剪方法（带有后期的倒带）可用于生成性损失，以找到获胜的门票。这种方法有效地产生了稀疏性的门票，自动编码器的稀疏性高达99％，VAES的票房为93％，CIFAR和Celeb-A数据集的Gans票数为89％。我们还证明了跨不同生成模型（GAN和VAE）共享相同体系结构的获胜门票的可转让性，这表明获奖门票具有能力偏见，可以帮助培训广泛的深层生成模型。此外，我们通过在很早的培训中检测到“早鸟门票”的培训中检测到彩票的实际好处。通过早期的票证，我们可以减少88％的浮点操作（FLOP）和减少54％的培训时间，从而可以通过严格的资源限制来训练大型生成模型。这些结果超过了现有的早期修剪方法，例如Snip（Lee，Ajanthan和Torr 2019）和Grasp（Wang，Zhang和Grosse 2020）。我们的发现阐明了存在适当的网络初始化，可以改善生成模型的收敛性和稳定性。

The lottery ticket hypothesis suggests that sparse, sub-networks of a given neural network, if initialized properly, can be trained to reach comparable or even better performance to that of the original network. Prior works in lottery tickets have primarily focused on the supervised learning setup, with several papers proposing effective ways of finding "winning tickets" in classification problems. In this paper, we confirm the existence of winning tickets in deep generative models such as GANs and VAEs. We show that the popular iterative magnitude pruning approach (with late rewinding) can be used with generative losses to find the winning tickets. This approach effectively yields tickets with sparsity up to 99% for AutoEncoders, 93% for VAEs and 89% for GANs on CIFAR and Celeb-A datasets. We also demonstrate the transferability of winning tickets across different generative models (GANs and VAEs) sharing the same architecture, suggesting that winning tickets have inductive biases that could help train a wide range of deep generative models. Furthermore, we show the practical benefits of lottery tickets in generative models by detecting tickets at very early stages in training called "early-bird tickets". Through early-bird tickets, we can achieve up to 88% reduction in floating-point operations (FLOPs) and 54% reduction in training time, making it possible to train large-scale generative models over tight resource constraints. These results out-perform existing early pruning methods like SNIP (Lee, Ajanthan, and Torr 2019) and GraSP (Wang, Zhang, and Grosse 2020). Our findings shed light towards existence of proper network initializations that could improve convergence and stability of generative models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题