论文标题
汤普森采样的统计效率
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
论文作者
论文摘要
我们研究了具有半伴兰反馈(CMAB)的随机组合多臂匪徒。在CMAB中,对于许多分布家庭(包括相互独立的结果),以及更普遍的多变量亚高斯家族(包括相互独立的结果),关于具有最佳渐近性遗憾的有效政策的问题仍然是开放的。我们建议通过分析组合汤普森采样政策(CTS)的变体来回答这两个家庭的上述问题。对于$ [0,1] $中的相互独立结果,我们建议使用beta先验对CTS进行严格的分析。然后,我们查看多元下高斯结果的更一般环境,并提出了使用高斯先验的CT进行严格的分析。最后的结果为我们提供了组合匪徒策略(ESCB)的有效抽样的替代方法,尽管最佳,但在计算上并不是有效的。
We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.