与异质奖励差异的近似上$ M $ $ ARM标识

论文标题

与异质奖励差异的近似上$ M $ $ ARM标识

Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances

论文作者

Zhou, Ruida, Tian, Chao

论文摘要

我们研究奖励方差异质性在近似$ m $ harm标识设置中的效果。在这种情况下，$ i $ -th手臂的奖励遵循$σ^2_i $ -sub-sub-gaussian分发，并且代理商需要合并此知识，以最大程度地减少预期的手臂拉力数，以识别$ m $ yarm in误差$ε$的$ m $ harm y $ n $ harm中的$ m $ harm，并具有至少$ 1-δ$ $ 1-δ$。我们表明，此问题的最坏情况样本复杂性是$$θ\左（\ sum_ {i = 1}^n \ frac {σ_i^2} {ε^2} \ ln \ ln \ ln \ frac {1}δ + \ sum_____________________________________ \ ln（m） + \ sum_ {j \ in G^{l}}} \ frac {σ_j^2} {ε^2} \ text {ent}（σ^2_ {g^{r^{r^{r^}}）\ right） $ \ {1，2，\ ldots，n \} $，$ \ text {ent}（\ cdot）$是一个类似熵的函数，可测量方差代理的异质性。复杂性的上限是使用分隔和诱使样式算法获得的，而匹配的下限依赖于双重配方的研究。

We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting. In this setting, the reward for the $i$-th arm follows a $σ^2_i$-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify $m$ arms with the largest means within error $ε$ out of the $n$ arms, with probability at least $1-δ$. We show that the worst-case sample complexity of this problem is $$Θ\left( \sum_{i =1}^n \frac{σ_i^2}{ε^2} \ln\frac{1}δ + \sum_{i \in G^{m}} \frac{σ_i^2}{ε^2} \ln(m) + \sum_{j \in G^{l}} \frac{σ_j^2}{ε^2} \text{Ent}(σ^2_{G^{r}}) \right),$$ where $G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set $\{1, 2, \ldots, n\}$, and $\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题