论文标题

BMO功能的土匪

Bandits for BMO Functions

论文作者

Wang, Tianyu, Rudin, Cynthia

论文摘要

我们研究匪徒问题,其中潜在的预期奖励是有界平均振荡(BMO)函数。 BMO函数可以不连续且无限制,并且可用于建模DO-MAIN无限元的信号。我们为BMO土匪开发了工具集,并提供了一种可以实现poly-log $Δ$ regret的算法 - 对拆卸$δ$尺寸的ARM空间中最佳的手臂的遗憾。

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with infinities in the do-main. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $δ$-regret -- a regret measured against an arm that is optimal after removing a $δ$-sized portion of the arm space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源