论文标题
贝叶斯最佳算法的次优性能在频繁主义者最佳手臂识别中
Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
论文作者
论文摘要
我们考虑固定预算的最佳手臂识别问题,并在正常分布之后奖励。在此问题中,预报员将获得$ K $臂(或治疗)和$ t $时间步骤。预报器试图通过使用算法进行自适应实验来找到最大平均值的手臂。该算法的性能是通过简单的遗憾来评估的,反映了估计的最佳臂的质量。虽然频繁的简单遗憾可以相对于$ t $呈指数下降,但贝叶斯简单的遗憾在多项性上减少。本文表明,在某些参数设置下,贝叶斯的最佳算法最小化了贝叶斯简单的遗憾,并不会导致简单遗憾的指数减少。这与众多的发现形成了鲜明的对比,这些发现表明在固定采样方案中,贝叶斯和频繁方法的渐近等效性。尽管贝叶斯的最佳算法被表达为几乎不可能准确计算的递归方程式,但我们通过引入一种称为预期的贝尔曼改善的新颖概念来为将来的研究奠定基础。
We consider the fixed-budget best arm identification problem with rewards following normal distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the arm with the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is evaluated by simple regret, reflecting the quality of the estimated best arm. While frequentist simple regret can decrease exponentially with respect to $T$, Bayesian simple regret decreases polynomially. This paper demonstrates that the Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not yield an exponential decrease in simple regret under certain parameter settings. This contrasts with the numerous findings that suggest the asymptotic equivalence of Bayesian and frequentist approaches in fixed sampling regimes. Although the Bayes optimal algorithm is formulated as a recursive equation that is virtually impossible to compute exactly, we lay the groundwork for future research by introducing a novel concept termed the expected Bellman improvement.