论文标题
可扩展的尖峰和单杆
Scalable Spike-and-Slab
论文作者
论文摘要
尖峰和单杆先生由于其可解释性和有利的统计特性而通常用于贝叶斯变量选择。但是,当变量数量较大时,现有的尖峰和锯齿状后侧面的采样器会产生过度的计算成本。在本文中,我们提出了可伸缩的尖峰和剪裁($ s^3 $),这是一种可扩展的吉布斯采样实现,用于高维贝叶斯回归,并具有乔治和麦卡洛克(George and McCulloch)的连续尖峰和刻板板(1993)。对于具有$ n $观测值和$ p $ cOVARIATES的数据集,$ s^3 $具有订单$ \ max \ {n^2 p_t,np \} $计算成本$ t $ t $,其中$ p_t $永远不会超过迭代率和slab covariates switching spike and-slab nate Itererations $ t $ t $ t $ t $ t $ t $ t $ t-1 $ t-1 $ t-1 $ t-1 $ t-1 $ t-1 $ t-1 $ t-1 $ t-1的标记。这可以改善最先进实施的$ n^2 p $每题量,因为通常,$ p_t $大大小于$ p $。我们将$ S^3 $应用于合成和现实世界数据集上,证明了现有精确采样器的数量级加速顺序,并在相当成本的近似采样器上进行了推理质量的显着增长。
Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab ($S^3$), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George and McCulloch (1993). For a dataset with $n$ observations and $p$ covariates, $S^3$ has order $\max\{ n^2 p_t, np \}$ computational cost at iteration $t$ where $p_t$ never exceeds the number of covariates switching spike-and-slab states between iterations $t$ and $t-1$ of the Markov chain. This improves upon the order $n^2 p$ per-iteration cost of state-of-the-art implementations as, typically, $p_t$ is substantially smaller than $p$. We apply $S^3$ on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.