关于强盗固定信心识别的消除策略

论文标题

关于强盗固定信心识别的消除策略

On Elimination Strategies for Bandit Fixed-Confidence Identification

论文作者

Tirinzoni, Andrea, Degenne, Rémy

论文摘要

匪徒识别算法的消除算法，依次修剪出合理的正确答案，直到仅剩下一个，因为它们随着时间的推移会减少问题大小，因此在计算上很方便。但是，现有的淘汰策略通常不是完全自适应的（它们很少会更新其采样规则），并且不容易扩展到组合设置，在问题维度中，答案集的成倍大。另一方面，解决通用识别问题的大多数现有完全自适应策略在计算上是要求的，因为它们反复测试每个答案的正确性，而不会减少问题大小。我们表明，可以修改自适应方法以在其停止和采样规则中使用消除，从而获得这两个世界中最好的：算法（1）保持完全适应性，（2）（2）样本复杂性，这些复杂性从来没有更糟糕的是他们非限制的对应物，以及（3）在某些错误的早期消除某些错误的答案。我们在实验上确认这些好处，其中消除可显着提高适应性方法的计算复杂性，例如线性匪徒中最佳臂识别。

Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination strategies are often not fully adaptive (they update their sampling rule infrequently) and are not easy to extend to combinatorial settings, where the set of answers is exponentially large in the problem dimension. On the other hand, most existing fully-adaptive strategies to tackle general identification problems are computationally demanding since they repeatedly test the correctness of every answer, without ever reducing the problem size. We show that adaptive methods can be modified to use elimination in both their stopping and sampling rules, hence obtaining the best of these two worlds: the algorithms (1) remain fully adaptive, (2) suffer a sample complexity that is never worse of their non-elimination counterpart, and (3) provably eliminate certain wrong answers early. We confirm these benefits experimentally, where elimination improves significantly the computational complexity of adaptive methods on common tasks like best-arm identification in linear bandits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题