使用注意力和视觉变压器提高基于价值模型的样本效率

论文标题

使用注意力和视觉变压器提高基于价值模型的样本效率

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

论文作者

Kalantari, Amir Ardalan, Amini, Mohammad, Chandar, Sarath, Precup, Doina

论文摘要

最近的许多深层强化学习成功归功于神经建筑的学习和使用世界上有效的内部表现的潜力。尽管许多当前的算法访问模拟器可以使用大量数据进行训练，但在现实的环境中，包括在玩可能与人玩的游戏时，收集经验可能很昂贵。在本文中，我们介绍了一种深厚的增强学习体系结构，其目的是提高样本效率而不牺牲绩效。我们通过纳入近年来在自然语言处理和计算机视觉领域中取得的进步来设计这种体系结构。具体而言，我们提出了一个视觉上专注的模型，该模型使用变形金刚在状态表示的特征图上学习一个自我注意力的机制，同时优化回报。我们从经验上证明，这种体系结构改善了几种Atari环境的样本复杂性，同时在某些游戏中也取得了更好的性能。

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We design this architecture by incorporating advances achieved in recent years in the field of Natural Language Processing and Computer Vision. Specifically, we propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation, while simultaneously optimizing return. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题