学习和解决常规决策过程

论文标题

学习和解决常规决策过程

Learning and Solving Regular Decision Processes

论文作者

Abadi, Eden, Brafman, Ronen I.

论文摘要

常规决策过程（RDP）是一个最近引入的模型，该模型扩展了具有非马克维亚动力学和奖励的MDP。非马克维亚行为仅限于历史的常规特性。可以使用有限轨迹的线性动态逻辑中的正则表达式或公式来指定这些。完全指定的RDP可以通过将其编译成适当的MDP来解决。从数据中学习RDP是一个具有挑战性的问题，尚未解决，我们将重点放在本文中。我们的方法基于使用MEALY机器的新表示，该机器会发出分配和每个州行动对的预期奖励。在此表示的基础上，我们将自动机学习技术与历史聚类相结合，以学习这种含糊的机器，并通过调整MCT来解决它。我们经验评估了这种方法，证明了这种方法的可行性。

Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. The non-Markovian behavior is restricted to depend on regular properties of the history. These can be specified using regular expressions or formulas in linear dynamic logic over finite traces. Fully specified RDPs can be solved by compiling them into an appropriate MDP. Learning RDPs from data is a challenging problem that has yet to be addressed, on which we focus in this paper. Our approach rests on a new representation for RDPs using Mealy Machines that emit a distribution and an expected reward for each state-action pair. Building on this representation, we combine automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS to it. We empirically evaluate this approach, demonstrating its feasibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题