论文标题
与异质奖励差异的近似上$ M $ $ ARM标识
Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances
论文作者
论文摘要
我们研究奖励方差异质性在近似$ m $ harm标识设置中的效果。在这种情况下,$ i $ -th手臂的奖励遵循$σ^2_i $ -sub-sub-gaussian分发,并且代理商需要合并此知识,以最大程度地减少预期的手臂拉力数,以识别$ m $ yarm in误差$ε$的$ m $ harm y $ n $ harm中的$ m $ harm,并具有至少$ 1-δ$ $ 1-δ$。我们表明,此问题的最坏情况样本复杂性是$$θ\左(\ sum_ {i = 1}^n \ frac {σ_i^2} {ε^2} \ ln \ ln \ ln \ frac {1}δ + \ sum_____________________________________ \ ln(m) + \ sum_ {j \ in G^{l}}} \ frac {σ_j^2} {ε^2} \ text {ent}(σ^2_ {g^{r^{r^{r^}})\ right) $ \ {1,2,\ ldots,n \} $,$ \ text {ent}(\ cdot)$是一个类似熵的函数,可测量方差代理的异质性。复杂性的上限是使用分隔和诱使样式算法获得的,而匹配的下限依赖于双重配方的研究。
We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting. In this setting, the reward for the $i$-th arm follows a $σ^2_i$-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify $m$ arms with the largest means within error $ε$ out of the $n$ arms, with probability at least $1-δ$. We show that the worst-case sample complexity of this problem is $$Θ\left( \sum_{i =1}^n \frac{σ_i^2}{ε^2} \ln\frac{1}δ + \sum_{i \in G^{m}} \frac{σ_i^2}{ε^2} \ln(m) + \sum_{j \in G^{l}} \frac{σ_j^2}{ε^2} \text{Ent}(σ^2_{G^{r}}) \right),$$ where $G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set $\{1, 2, \ldots, n\}$, and $\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.