奖励机的层次结构

论文标题

奖励机的层次结构

Hierarchies of Reward Machines

论文作者

Furelos-Blanco, Daniel, Law, Mark, Jonsson, Anders, Broda, Krysia, Russo, Alessandra

论文摘要

奖励机（RMS）是最近形式主义，用于通过有限状态的机器使用高级事件来代表强化学习任务的奖励功能。 RMS的结构将任务分解为更简单，独立解决的子任务，以帮助解决长匹马和/或稀疏奖励任务。我们提出了一种形式主义，用于通过赋予调用其他RMS的RM来进一步抽象子任务结构，从而构成RMS（HRM）的层次结构。我们通过使用选项框架将每个呼叫视为可独立解决的子任务来利用HRM，并描述一种基于课程的方法，以从代理观察到的痕迹中学习HRMS。我们的实验表明，利用手工制作的HRM会导致比平面HRM更快地收敛，并且在其等效平面表示不相同的情况下，学习HRM是可行的。

Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not.

下载PDF全文

下载文献需遵守相关版权规定

论文标题