论文标题

方差意识到稀疏线性匪徒

Variance-Aware Sparse Linear Bandits

论文作者

Dai, Yan, Wang, Ruosong, Du, Simon S.

论文摘要

众所周知,对于稀疏的线性匪徒而言,当忽略对稀疏性的依赖性比环境尺寸小得多时,最差的minimax遗憾是$ \ \ \widetildeθ\ left(\ sqrt {dt} \ right)$ d $,其中$ d $是环境尺寸,$ t $是$ t $的数量。另一方面,在没有噪音的良性环境中,动作集是单位球体,可以使用分隔和争议来实现$ \ widetilde {\ Mathcal o}(1)$遗憾,它(几乎)独立于$ d $和$ t $。在本文中,我们介绍了稀疏线性匪徒的第一个方差遗憾保证:$ \ widetilde {\ Mathcal o} \ left(\ sqrt {d \ sum_ {t = 1}^t = 1}^tσ_t^2} 2} 2} + 1 \ right)这种界限自然会插入最差的常数恒定方差制度(即$σ_T\equivΩ(1)$)和良性确定性制度(即$σ_T\ equiv 0 $)的遗憾界限。为了实现这一差异遗憾的保证,我们开发了一个通用框架,该框架将任何方差感知的线性匪徒算法转换为以“黑色盒子”方式的稀疏线性匪徒的方差感知算法。具体而言,我们将最近的两种算法作为黑匣子来说明确实存在声称的界限,第一个算法可以处理未知的差异案例,第二个算法更有效。

It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetildeΘ\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T σ_t^2} + 1\right)$, where $σ_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $σ_t \equiv Ω(1)$) and the benign deterministic regimes (i.e., $σ_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源