论文标题
分布式汤普森采样
Distributed Thompson Sampling
论文作者
论文摘要
我们研究了带有M代理和K臂的合作多代理多军匪徒。代理商的目标是最大程度地减少累积遗憾。我们在分布式设置下调整了传统的汤普森采样徒劳。但是,凭借代理商的沟通能力,我们注意到沟通可能进一步减少分布式汤普森采样方法的遗憾的上限。为了进一步提高分布式汤普森采样的性能,我们提出了一种基于分布式消除的汤普森采样算法,使代理可以协作学习。我们分析了Bernoulli奖励下的算法,并得出了依赖于累积遗憾的问题。
We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.