论文标题

基于模型的RL具有乐观的后验采样:结构条件和样品复杂性

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity

论文作者

Agarwal, Alekh, Zhang, Tong

论文摘要

我们提出了一个通用框架,以设计基于模型的RL的后验采样方法。我们表明,可以通过在条件概率估计中减少对Hellinger距离的遗憾来分析所提出的算法。我们进一步表明,当我们通过数据可能性测量模型误差时,乐观的后采样可以控制此Hellinger距离。这种技术使我们能够设计和分析许多基于模型的RL设置的最新样品复杂性保证统一的后验算法。我们在许多特殊情况下说明了我们的总体结果,证明了我们框架的多功能性。

We propose a general framework to design posterior sampling methods for model-based RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger distance in conditional probability estimation. We further show that optimistic posterior sampling can control this Hellinger distance, when we measure model error via data likelihood. This technique allows us to design and analyze unified posterior sampling algorithms with state-of-the-art sample complexity guarantees for many model-based RL settings. We illustrate our general result in many special cases, demonstrating the versatility of our framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源