论文标题
走向贝叶斯数据压缩
Towards Bayesian Data Compression
论文作者
论文摘要
为了处理现代科学中的大数据集无处不在,有效的压缩算法是必要的。在这里,适应特定测量情况的贝叶斯数据压缩(BDC)算法是在信号重建的背景下得出的。 BDC在保存其后验结构下压缩数据集,并且鉴于有关信号的先验知识,即关注数量,因此信息损失最小。它的基本形式适用于高斯先验和可能性。对于恒定的噪声标准偏差,基本BDC等同于主成分分析的贝叶斯类似物。 BDC使用公制的高斯变异推断,将其推广到非线性设置。在当前形式中,BDC需要为压缩数据和编码后协方差结构的相应噪声存储有效的仪器响应函数。他们的记忆需求抵消了压缩增益。为了改善这一点,可以通过将数据分离成斑块并分别压缩来获得压缩响应的稀疏性。通过将其应用于合成数据和射电天文数据来证明BDC的适用性。随着压缩的计算时间和随后的推断超过了与原始数据的推断时间,该算法仍需要进一步改进。
In order to handle large data sets omnipresent in modern science, efficient compression algorithms are necessary. Here, a Bayesian data compression (BDC) algorithm that adapts to the specific measurement situation is derived in the context of signal reconstruction. BDC compresses a data set under conservation of its posterior structure with minimal information loss given the prior knowledge on the signal, the quantity of interest. Its basic form is valid for Gaussian priors and likelihoods. For constant noise standard deviation, basic BDC becomes equivalent to a Bayesian analog of principal component analysis. Using Metric Gaussian Variational Inference, BDC generalizes to non-linear settings. In its current form, BDC requires the storage of effective instrument response functions for the compressed data and corresponding noise encoding the posterior covariance structure. Their memory demand counteract the compression gain. In order to improve this, sparsity of the compressed responses can be obtained by separating the data into patches and compressing them separately. The applicability of BDC is demonstrated by applying it to synthetic data and radio astronomical data. Still the algorithm needs further improvement as the computation time of the compression and subsequent inference exceeds the time of the inference with the original data.