从加强学习中解耦表示学习

论文标题

从加强学习中解耦表示学习

Decoupling Representation Learning from Reinforcement Learning

论文作者

Stooke, Adam, Lee, Kimin, Abbeel, Pieter, Laskin, Michael

论文摘要

为了克服从图像中深入增强学习（RL）中奖励驱动的特征学习的局限性，我们提出了从政策学习中进行解耦表示学习。为此，我们引入了一项新的无监督学习（UL）任务，称为增强的时间对比度（ATC），该任务训练了卷积编码器，以将图像增强和对比度损失在短时差异的观测值对中关联。在在线RL实验中，我们表明在大多数环境中，仅使用ATC匹配或端到端RL训练编码器。此外，我们通过在专家示范中进行预训练的编码器对几种领先的UL算法进行基准测试，并在RL代理中使用它们，并将其冻结。我们发现使用ATC训练编码器的代理商的表现优于其他所有代理。我们还训练来自多个环境的数据的多任务编码器，并向不同的下游RL任务显示概括。最后，我们烧毁ATC的组件，并引入新的数据增强，以在RL需要增强时从预训练的编码器中重播（压缩）潜在图像。我们的实验涵盖了DeepMind Control，DeepMind Lab和Atari的视觉上不同的RL基准，并且我们的完整代码可在https://github.com/astooke.com/rllpyt/rlpyt/tree/master/master/rlpyt/ul上找到。

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.

下载PDF全文

下载文献需遵守相关版权规定

论文标题