论文标题
维基百科可以帮助离线增强学习吗?
Can Wikipedia Help Offline Reinforcement Learning?
论文作者
论文摘要
微调加强学习(RL)模型由于缺乏大规模的现成数据集以及不同环境之间可传递性的较高差异而变得具有挑战性。最近的工作着眼于从序列建模的角度来解决离线RL,并通过引入变压器体系结构的结果得到改进的结果。但是,当模型从头开始训练时,它会遭受缓慢的收敛速度。在本文中,我们希望利用这种增强学习作为序列建模的表述,并研究在离线RL任务(控制,游戏)上进行填充时,在其他领域(视觉,语言)上进行了预训练的序列模型的可传递性。为此,我们还提出了提高这些域之间传递的技术。结果表明,在各种环境上的收敛速度和奖励方面都表明了一致的性能提高,通过Wikipedia-pretrenained和GPT2语言模型在各种任务中加速了3-6倍的培训,并在各种任务中实现了最先进的表现。我们希望这项工作不仅为RL利用通用序列建模技术和预训练的模型的潜力带来了启发,还激发了未来的工作,在完全不同领域的生成建模任务之间共享知识。
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.