贝叶斯优化的蒙特卡洛计划

论文标题

贝叶斯优化的蒙特卡洛计划

Bayesian Optimized Monte Carlo Planning

论文作者

Mern, John, Yildiz, Anil, Sunberg, Zachary, Mukerji, Tapan, Kochenderfer, Mykel J.

论文摘要

可观察到的马尔可夫决策过程的在线求解器很难扩展到大型动作空间的问题。蒙特卡洛树搜索通过从动作空间进行采样来构建策略搜索树，以逐步扩大尝试改善缩放的尝试。渐进式扩大搜索的性能取决于动作抽样策略，通常需要特定问题的采样器。在这项工作中，我们提出了一种基于贝叶斯优化的有效动作采样的通用方法。所提出的方法使用高斯过程来建模对动作值函数的信念，并选择将最大程度地提高最佳动作值的预期改善的动作。我们在一种称为贝叶斯优化的蒙特卡洛计划（BOMCP）的新的在线树搜索算法中实现了建议的方法。几项实验表明，与现有的最新树搜索求解器相比，BOMCP能够更好地扩展到大型动作空间POMDP。

Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree. The performance of progressive widening search is dependent upon the action sampling policy, often requiring problem-specific samplers. In this work, we present a general method for efficient action sampling based on Bayesian optimization. The proposed method uses a Gaussian process to model a belief over the action-value function and selects the action that will maximize the expected improvement in the optimal action value. We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning (BOMCP). Several experiments show that BOMCP is better able to scale to large action space POMDPs than existing state-of-the-art tree search solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题