论文标题

使用无纤维学习者在线性匪徒中明确的最佳手臂识别

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

论文作者

Zaki, Mohammadi, Mohan, Avi, Gopalan, Aditya

论文摘要

我们研究了线性参数化的多臂匪徒中最佳手臂识别的问题。给定一组功能向量$ \ MATHCAL {X} \ subset \ MathBb {r}^d,$ a Profesity参数$δ$和一个未知的向量$θ^*,$,$ $是确定$ \ arg arg \ max_ {x max_ {x \ in \ mathcal {x} x^tin posity,至少是$ 1- $ 1- $ 1-至少1-1-1-1--对于此固定置信度($δ$ -PAC)设置的$ x^tθ^*。$,我们提出了一个明确的可实现且可证明的订单 - 最佳样品复杂算法来解决此问题。先前的方法依赖于对最小值优化甲壳的访问。该算法称为\ textIt {phase-phastion impination linear探索游戏}(peleg),在每个回合中都保持着具有$θ^*$的高概率的信心椭圆形,并使用它来消除次级优势。 Peleg通过将问题解释为两个玩家的零和零游戏,并使用低重新学习者在每轮中计算玩家的策略来依次将问题解释为两名玩家的零和鞍点,从而在最令人困惑的(即接近但不是最佳)方向上快速收缩了这种信心。我们分析了Peleg的样品复杂性,并表明它与订单相匹配,在线性匪徒设置中,样品复杂性依赖于实例的下限。我们还为符合其理论保证的算法提供了数值结果。

We study the problem of best arm identification in linearly parameterised multi-armed bandits. Given a set of feature vectors $\mathcal{X}\subset\mathbb{R}^d,$ a confidence parameter $δ$ and an unknown vector $θ^*,$ the goal is to identify $\arg\max_{x\in\mathcal{X}}x^Tθ^*$, with probability at least $1-δ,$ using noisy measurements of the form $x^Tθ^*.$ For this fixed confidence ($δ$-PAC) setting, we propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem. Previous approaches rely on access to minimax optimization oracles. The algorithm, which we call the \textit{Phased Elimination Linear Exploration Game} (PELEG), maintains a high-probability confidence ellipsoid containing $θ^*$ in each round and uses it to eliminate suboptimal arms in phases. PELEG achieves fast shrinkage of this confidence ellipsoid along the most confusing (i.e., close to, but not optimal) directions by interpreting the problem as a two player zero-sum game, and sequentially converging to its saddle point using low-regret learners to compute players' strategies in each round. We analyze the sample complexity of PELEG and show that it matches, up to order, an instance-dependent lower bound on sample complexity in the linear bandit setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源