BMO功能的土匪

论文标题

BMO功能的土匪

Bandits for BMO Functions

论文作者

Wang, Tianyu, Rudin, Cynthia

论文摘要

我们研究匪徒问题，其中潜在的预期奖励是有界平均振荡（BMO）函数。 BMO函数可以不连续且无限制，并且可用于建模DO-MAIN无限元的信号。我们为BMO土匪开发了工具集，并提供了一种可以实现poly-log $Δ$ regret的算法 - 对拆卸$δ$尺寸的ARM空间中最佳的手臂的遗憾。

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with infinities in the do-main. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $δ$-regret -- a regret measured against an arm that is optimal after removing a $δ$-sized portion of the arm space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题