GflowOut：带有生成流网络的辍学

论文标题

GflowOut：带有生成流网络的辍学

GFlowOut: Dropout with Generative Flow Networks

论文作者

Liu, Dianbo, Jain, Moksh, Dossou, Bonaventure, Shen, Qianli, Lahlou, Salem, Goyal, Anirudh, Malkin, Nikolay, Emezue, Chris, Zhang, Dinghuai, Hassen, Nadhir, Ji, Xu, Kawaguchi, Kenji, Bengio, Yoshua

论文摘要

贝叶斯推断提供了原则上的工具来解决许多关键问题，这些问题与现代神经网络（例如校准和概括性差以及数据效率低下）的许多关键问题。但是，将贝叶斯推论缩放为大型体系结构是具有挑战性的，需要限制近似。蒙特卡洛辍学者已被广泛用作近似推断的一种相对便宜的方式，并通过深层神经网络估算不确定性。传统上，辍学面膜由独立于固定分布进行采样。最近的作品表明，辍学蒙版可以视为潜在变量，可以通过变异推断来推断。这些方法面临两个重要的挑战：（a）掩模的后验分布可能是高度多模式的，这可能很难与标准变化推断相结合，并且（b）完全利用样品依赖性信息和下去膜之间的相关性并不是微不足道的。在这项工作中，我们建议GflowOut解决这些问题。 GflowOut利用了最近提出的生成流网络（GFLOWNETS）的概率框架，以了解辍学掩模的后验分布。我们从经验上证明，GflowOut会导致预测分布，从而更好地推广到分布数据，并提供不确定性估计，从而在下游任务中提高性能更好。

Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题