论文标题
贝叶斯学习的近乎最佳数据源选择
Near-Optimal Data Source Selection for Bayesian Learning
论文作者
论文摘要
我们研究了贝叶斯学习中的一个基本问题,其目标是选择一组最低成本的数据源,同时根据所选数据源提供的数据流实现一定的学习绩效。首先,我们表明贝叶斯学习的数据源选择问题是NP-HARD。然后,我们证明数据源选择问题可以转换为文献中研究的涵盖问题的涵盖问题的实例,并提供了标准的贪婪算法,以通过可证明的性能保证解决数据源选择问题。接下来,我们提出了一种快速贪婪的算法,可改善标准贪婪算法的运行时间,同时实现与标准贪婪算法相当的性能保证。快速贪婪的算法也可以应用于通过性能保证来解决一般的supodular set覆盖问题。最后,我们使用数值示例来验证理论结果,并表明贪婪算法在实践中效果很好。
We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. The fast greedy algorithm can also be applied to solve the general submodular set covering problem with performance guarantees. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.