论文标题
部分可观测时空混沌系统的无模型预测
Regenerative Particle Thompson Sampling
论文作者
论文摘要
本文提出了再生粒子汤普森采样(RPTS),这是汤普森采样的灵活变化。汤普森(Thompson)采样本身是解决随机匪徒问题的贝叶斯启发式方法,但是由于保持连续的后验分布的棘手性,在实践中很难实施。粒子汤普森采样(PTS)是通过简单地通过在一组加权静态颗粒上支持的离散分布替换连续分布来获得的汤普森采样的近似值。我们观察到,在PTS中,除少数拟合粒子外,所有的权重趋于零。 RPT基于启发式:删除衰减的不合适颗粒,并在适合幸存的颗粒附近再生新颗粒。经验证据表明,从PTS到RPTS的统一改善以及RPT在一组代表性的匪徒问题中的灵活性和功效,包括应用于5G网络切片的应用。
This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.