论文标题

BADDR:贝叶斯自适应深液辍学RL pomdps

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

论文作者

Katt, Sammie, Nguyen, Hai, Oliehoek, Frans A., Amato, Christopher

论文摘要

尽管增强学习(RL)在可伸缩性方面取得了长足的进步,但探索和部分可观察性仍然是积极的研究主题。相比之下,贝叶斯RL(BRL)为州估计和勘探探索折衷的权衡提供了有原则的答案,但努力进行扩展。为了应对这一挑战,已经提出了具有各种先前假设的BRL框架,并取得了不同的成功。这项工作在部分可观察性下呈现了BRL的表示形式不合时宜的公式,并在一个理论上的伞下统一了先前的模型。为了证明其实践意义,我们还提出了一种新型的推导,基于辍学网络的贝叶斯自适应深液掉落RL(BADDR)。在此参数化下,与以前的工作相反,对状态和动态的信念是一个更可扩展的推论问题。我们通过蒙特卡洛树搜索选择动作,并从经验上表明,我们的方法具有小域上的最先进的BRL方法,同时能够解决更大的域。

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源