较低开关成本的多项式logit强盗

论文标题

较低开关成本的多项式logit强盗

Multinomial Logit Bandit with Low Switching Cost

论文作者

Dong, Kefan, Li, Yingkai, Zhang, Qin, Zhou, Yuan

论文摘要

我们以有限的适应性研究了多项式logit bandit，在实现几乎最佳的minimax遗憾时，算法会尽可能地改变其勘探动作。我们提出了两种适应性措施：分类切换成本和更细粒度的切换成本。我们提出了一个随时随地的算法（at-ducb），带有$ o（n \ log t）$分类开关，几乎匹配下限$ω（\ frac {n \ log t} {\ log log \ log \ log \ log \ log t}）$。在固定水平设置中，我们的算法FH-DUCB会产生$ O（n \ log \ log t）$分类开关，与渐近下限匹配。我们还提供了ESUCB算法的项目开关费用$ O（N \ log^2 T）$。

We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $Ω(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题