论文标题
最大量式分组无限武器
Max-Quantile Grouped Infinite-Arm Bandits
论文作者
论文摘要
在本文中,我们考虑了一个强盗问题,其中有许多组由无限多手臂组成。每当从给定组中要求新的臂,其平均奖励将从未知的储层分布中获取(每个组不同),并且只能通过随后的手臂拉动来减少ARM平均奖励的不确定性。目的是确定其储层分布的无限臂组最高的$(1-α)$ - 刻克(例如,如果$α= \ frac {1} {2} $)使用尽可能少的手臂拉动。我们介绍了一种两步算法,该算法首先请求每个组的固定数量的臂,然后运行有限的武器分组的最大Quantile Bandit算法。我们表征了实例依赖性和最坏情况的遗憾,并为后者提供了匹配的下限,同时讨论了各种优势,劣势,算法改进以及与实例依赖性上限相关的潜在下限。
In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinite-arm group whose reservoir distribution has the highest $(1-α)$-quantile (e.g., median if $α= \frac{1}{2}$), using as few total arm pulls as possible. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with our instance-dependent upper bounds.