论文标题
近似概率推断的基础后验
Foundation Posteriors for Approximate Probabilistic Inference
论文作者
论文摘要
概率程序为生成模型提供了表达性表示语言。给定一个概率程序,我们对后推理的任务感兴趣:估计一个被观察到的变量的潜在变量。现有的概率计划中推断的技术通常需要选择许多超参数,在计算上是昂贵的,并且/或仅适用于限制类别的程序。在这里,我们将推断作为掩盖语言建模:给定程序,我们生成了一个监督的变量和分配数据集,并随机掩盖了一部分作业的子集。然后,我们训练神经网络以揭示随机值,从而定义了近似的后验分布。通过在一系列程序中优化单个神经网络,我们可以摊销培训的成本,从而产生“基础”后部能够对新程序进行零射击推断。通过优化变异推理目标,也可以针对特定程序和数据集进行微调。我们在Stan程序的基准上显示了该方法的功效,零射和微调。
Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for restricted classes of programs. Here we formulate inference as masked language modeling: given a program, we generate a supervised dataset of variables and assignments, and randomly mask a subset of the assignments. We then train a neural network to unmask the random values, defining an approximate posterior distribution. By optimizing a single neural network across a range of programs we amortize the cost of training, yielding a "foundation" posterior able to do zero-shot inference for new programs. The foundation posterior can also be fine-tuned for a particular program and dataset by optimizing a variational inference objective. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.