组合半匪徒的统计高效，多项式时间算法

论文标题

组合半匪徒的统计高效，多项式时间算法

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

论文作者

Cuvelier, Thibaut, Combes, Richard, Gourdin, Eric

论文摘要

我们考虑一组武器$ {\ cal x} \ subset \ {0,1 \}^d $上的组合半伴侣，其中奖励在项目之间不相关。对于这个问题，算法ESCB产生最小的已知遗憾绑定到$ r（t）= {\ cal o} \ big（{d（\ ln m）^2（\ ln m）^2（\ ln t）\超过δ_{\ min}}} \ big）$ $ d $，不能在大尺寸上使用。我们提出了第一种算法，该算法在计算和统计上对于此问题而言，遗憾的$ r（t）= {\ cal o} \ big（{d（\ ln m）^2（\ ln m）^2（\ ln t）\foreΔ_{\ min}}}}} \ big）和计算复杂性$ $ {\ cal o}（t）我们的方法涉及仔细设计ESCB的大约遗憾保证，这表明可以在$ {\ cal o}（\ cal o}（t {\ bf poly}（d））$的时间内实现此近似算法，通过反复在$ {\ cal x}上最大化线性函数，以在线性预算限制方面效力，并使该方法更加易于效果，并且可以通过$ {\ cal x} $最大化。

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over Δ_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题