方差意识到稀疏线性匪徒

论文标题

方差意识到稀疏线性匪徒

Variance-Aware Sparse Linear Bandits

论文作者

Dai, Yan, Wang, Ruosong, Du, Simon S.

论文摘要

众所周知，对于稀疏的线性匪徒而言，当忽略对稀疏性的依赖性比环境尺寸小得多时，最差的minimax遗憾是$ \ \ \widetildeθ\ left（\ sqrt {dt} \ right）$ d $，其中$ d $是环境尺寸，$ t $是$ t $的数量。另一方面，在没有噪音的良性环境中，动作集是单位球体，可以使用分隔和争议来实现$ \ widetilde {\ Mathcal o}（1）$遗憾，它（几乎）独立于$ d $和$ t $。在本文中，我们介绍了稀疏线性匪徒的第一个方差遗憾保证：$ \ widetilde {\ Mathcal o} \ left（\ sqrt {d \ sum_ {t = 1}^t = 1}^tσ_t^2} 2} 2} + 1 \ right）这种界限自然会插入最差的常数恒定方差制度（即$σ_T\equivΩ（1）$）和良性确定性制度（即$σ_T\ equiv 0 $）的遗憾界限。为了实现这一差异遗憾的保证，我们开发了一个通用框架，该框架将任何方差感知的线性匪徒算法转换为以“黑色盒子”方式的稀疏线性匪徒的方差感知算法。具体而言，我们将最近的两种算法作为黑匣子来说明确实存在声称的界限，第一个算法可以处理未知的差异案例，第二个算法更有效。

It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetildeΘ\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T σ_t^2} + 1\right)$, where $σ_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $σ_t \equiv Ω(1)$) and the benign deterministic regimes (i.e., $σ_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题