论文标题

有限参数化的多军匪徒的有限遗憾

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

论文作者

Panaganti, Kishan, Kalathil, Dileep

论文摘要

我们考虑有限参数化的多臂匪徒的问题,其中可以根据常见的未知参数来表征潜在的随机环境的模型。学习代理未知的真实参数。但是,有限的可能参数集已知先验。我们提出了一种简单易于实现的算法,我们称之为有限的参数化上限限制(FP-UCB)算法,该算法使用有关基础参数集的信息,以用于更快的学习。特别是,我们表明FP-UCB算法在基础参数集的某些结构条件下实现了有限的遗憾。我们还表明,如果基础参数集不满足必要的结构条件,则FP-UCB算法会实现对数遗憾,但与标准UCB算法相比,前常数较小。我们还通过广泛的数值模拟来验证FP-UCB算法的出色性能。

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, the FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源