BADDR：贝叶斯自适应深液辍学RL pomdps

论文标题

BADDR：贝叶斯自适应深液辍学RL pomdps

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

论文作者

Katt, Sammie, Nguyen, Hai, Oliehoek, Frans A., Amato, Christopher

论文摘要

尽管增强学习（RL）在可伸缩性方面取得了长足的进步，但探索和部分可观察性仍然是积极的研究主题。相比之下，贝叶斯RL（BRL）为州估计和勘探探索折衷的权衡提供了有原则的答案，但努力进行扩展。为了应对这一挑战，已经提出了具有各种先前假设的BRL框架，并取得了不同的成功。这项工作在部分可观察性下呈现了BRL的表示形式不合时宜的公式，并在一个理论上的伞下统一了先前的模型。为了证明其实践意义，我们还提出了一种新型的推导，基于辍学网络的贝叶斯自适应深液掉落RL（BADDR）。在此参数化下，与以前的工作相反，对状态和动态的信念是一个更可扩展的推论问题。我们通过蒙特卡洛树搜索选择动作，并从经验上表明，我们的方法具有小域上的最先进的BRL方法，同时能够解决更大的域。

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题