离散到深度监督政策学习

论文标题

离散到深度监督政策学习

Discrete-to-Deep Supervised Policy Learning

论文作者

Kurniawan, Budi, Vamplew, Peter, Papasimeon, Michael, Dazeley, Richard, Foale, Cameron

论文摘要

神经网络是有效的功能近似值，但是在增强学习（RL）上下文中很难训练，这主要是因为样本是相关的。多年来，学者们通过使用经验重播或异步并行代理系统来解决这一问题。本文提出了用于培训RL的神经网络的离散监督政策学习（D2D-SPL）。 D2D-SPL将连续的状态空间离散到离散状态，并使用参与者批评来学习政策。然后，从每个离散状态中选择输入值和最高数值偏好作为输入/目标对的动作。最后，它使用所有离散状态的输入/目标对来训练分类器。 D2D-SPL使用单个代理，不需要经验重播，并且比最先进的方法更快地学习。我们使用两个RL环境测试我们的方法，即Cartpole和飞机机动模拟器。

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题