论文标题

贝叶斯变量选择一百万个维度

Bayesian Variable Selection in a Million Dimensions

论文作者

Jankowiak, Martin

论文摘要

贝叶斯变量选择是数据分析的强大工具,因为它为可变选择提供了原则性的方法,该方法可以说明事先信息和不确定性。但是,贝叶斯变量选择的广泛采用受到计算挑战的阻碍,尤其是在大量协变量P或非偶联的可能性的困难政权中。为了扩展大型P制度,我们引入了一种有效的MCMC方案,其每次迭代的成本在P中是均等的。此外,我们还展示了该方案如何扩展到计数数据的通用线性模型,这些模型在生物学,生态学,经济学以及其他方面都普遍存在。特别是,我们设计有效的算法,用于在二项式和负二项式回归中的可变选择,其中包括逻辑回归作为一种特殊情况。在实验中,我们证明了方法的有效性,包括对癌症和玉米基因组数据。

Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源