贝叶斯政策搜索随机域

论文标题

贝叶斯政策搜索随机域

Bayesian Policy Search for Stochastic Domains

论文作者

Tolpin, David, Zhou, Yuan, Yang, Hongseok

论文摘要

AI计划可以作为概率模型中的推断，并且概率编程被证明能够在可观察到的域中进行策略搜索。先前的工作通过马尔可夫链蒙特卡洛（Monte Carlo）在确定性域中介绍了策略搜索，并将黑盒变化推断适应随机域，但是在严格的贝叶斯意义上并不是。在这项工作中，我们将策略搜索作为贝叶斯推理问题进行了策略搜索，并提供了编码嵌套概率程序等问题的方案。我们认为，在随机域中进行政策搜索的概率计划应涉及嵌套条件，并提供轻量级大都市杂货（LMH）的适应性，以在此类程序中进行强大的推断。我们将提出的方案应用于随机域，并表明尽管有更简单，更通用的推论算法，但仍学习了相似质量的策略。我们认为，提议的LMH变体是新颖的，并且适用于具有嵌套条件的更广泛的概率程序。

AI planning can be cast as inference in probabilistic models, and probabilistic programming was shown to be capable of policy search in partially observable domains. Prior work introduces policy search through Markov chain Monte Carlo in deterministic domains, as well as adapts black-box variational inference to stochastic domains, however not in the strictly Bayesian sense. In this work, we cast policy search in stochastic domains as a Bayesian inference problem and provide a scheme for encoding such problems as nested probabilistic programs. We argue that probabilistic programs for policy search in stochastic domains should involve nested conditioning, and provide an adaption of Lightweight Metropolis-Hastings (LMH) for robust inference in such programs. We apply the proposed scheme to stochastic domains and show that policies of similar quality are learned, despite a simpler and more general inference algorithm. We believe that the proposed variant of LMH is novel and applicable to a wider class of probabilistic programs with nested conditioning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题