汤普森采样的统计效率

论文标题

汤普森采样的统计效率

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

论文作者

Perrault, Pierre, Boursier, Etienne, Perchet, Vianney, Valko, Michal

论文摘要

我们研究了具有半伴兰反馈（CMAB）的随机组合多臂匪徒。在CMAB中，对于许多分布家庭（包括相互独立的结果），以及更普遍的多变量亚高斯家族（包括相互独立的结果），关于具有最佳渐近性遗憾的有效政策的问题仍然是开放的。我们建议通过分析组合汤普森采样政策（CTS）的变体来回答这两个家庭的上述问题。对于$ [0,1] $中的相互独立结果，我们建议使用beta先验对CTS进行严格的分析。然后，我们查看多元下高斯结果的更一般环境，并提出了使用高斯先验的CT进行严格的分析。最后的结果为我们提供了组合匪徒策略（ESCB）的有效抽样的替代方法，尽管最佳，但在计算上并不是有效的。

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题