论文标题
较低开关成本的多项式logit强盗
Multinomial Logit Bandit with Low Switching Cost
论文作者
论文摘要
我们以有限的适应性研究了多项式logit bandit,在实现几乎最佳的minimax遗憾时,算法会尽可能地改变其勘探动作。我们提出了两种适应性措施:分类切换成本和更细粒度的切换成本。我们提出了一个随时随地的算法(at-ducb),带有$ o(n \ log t)$分类开关,几乎匹配下限$ω(\ frac {n \ log t} {\ log log \ log \ log \ log \ log t})$。在固定水平设置中,我们的算法FH-DUCB会产生$ O(n \ log \ log t)$分类开关,与渐近下限匹配。我们还提供了ESUCB算法的项目开关费用$ O(N \ log^2 T)$。
We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $Ω(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.