通过双重上升的连续空间的受约束POMDP的在线计划

论文标题

通过双重上升的连续空间的受约束POMDP的在线计划

Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent

论文作者

Jamgochian, Arec, Corso, Anthony, Kochenderfer, Mykel J.

论文摘要

与其对不希望的行为的罚款增强奖励，不如通过施加不可侵犯的硬约束价值预算来安全地计划的马尔可夫决策过程（CPOMDPS）安全地计划。先前对CPOMDP的在线计划进行的工作仅应用于离散的动作和观察空间。在这项工作中，我们通过将双重升高与逐步扩大，为在线CPOMDP计划进行在线CPOMDP计划进行算法。我们从经验上比较了我们提出的算法对连续CPOMDP的有效性，该算法对玩具和现实世界中的安全问题进行了建模。此外，我们将在线求解器用于连续不受限制的POMDP进行比较，该pomdps将成本限制标记为奖励，并研究乐观的成本传播的效果。

Rather than augmenting rewards with penalties for undesired behavior, Constrained Partially Observable Markov Decision Processes (CPOMDPs) plan safely by imposing inviolable hard constraint value budgets. Previous work performing online planning for CPOMDPs has only been applied to discrete action and observation spaces. In this work, we propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces by combining dual ascent with progressive widening. We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems. Additionally, we compare against the use of online solvers for continuous unconstrained POMDPs that scalarize cost constraints into rewards, and investigate the effect of optimistic cost propagation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题