基于索引的确定性渐近算法，用于约束多臂匪徒问题

论文标题

基于索引的确定性渐近算法，用于约束多臂匪徒问题

An Index-based Deterministic Asymptotically Optimal Algorithm for Constrained Multi-armed Bandit Problems

论文作者

Chang, Hyeong Soo

论文摘要

对于约束多臂匪徒的模型，我们表明，通过构造存在基于索引的确定性渐近算法。最优性是通过在无限范围上选择最佳可行手臂的概率的收敛性来实现的。该算法是基于Locatelli等人的“任何时间无参数阈值”算法构建的，假设最佳值已知。我们提供了一个有限的时间绑定，以1-O（| a | te^{ - t}）给出的渐近优化性的概率，其中t是地平线的大小，a是匪徒中的臂组。然后，我们以一般形式研究了算法的放松反化，该算法估计最佳值并讨论了足够大的t之后，算法的渐近优化性（示例）。

For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an optimal feasible arm to one over infinite horizon. The algorithm is built upon Locatelli et al.'s "anytime parameter-free thresholding" algorithm under the assumption that the optimal value is known. We provide a finite-time bound to the probability of the asymptotic optimality given as 1-O(|A|Te^{-T}) where T is the horizon size and A is the set of the arms in the bandit. We then study a relaxed-version of the algorithm in a general form that estimates the optimal value and discuss the asymptotic optimality of the algorithm after a sufficiently large T with examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题