共同学习的国家行动嵌入有效的加强学习

论文标题

共同学习的国家行动嵌入有效的加强学习

Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

论文作者

Pritz, Paul J., Ma, Liang, Leung, Kin K.

论文摘要

尽管近年来，强化学习取得了巨大的成功，但最先进的模型通常仍受国家和行动空间的规模限制。无模型的强化学习方法使用某种形式的国家表示形式，最新工作探索了动作的嵌入技术，既是为了实现更好的概括和适用性。但是，这些方法仅考虑状态或行动，在生成嵌入式表示时忽略它们之间的相互作用。在这项工作中，我们建立了使用嵌入式状态和行动培训强化学习者的有效性的理论基础。然后，我们提出了一种新的方法，用于联合学习嵌入的状态和行动，以结合无模型和基于模型的强化学习的各个方面，这可以应用于离散和连续域中。具体来说，我们使用环境模型来获取国家和行动的嵌入，并提出一种利用这些策略的通用体系结构。通过这种方式，通过我们的方法获得的嵌入式表示形式通过捕获嵌入空间中的相似性来更好地对国家和动作进行更好的概括。我们对几种游戏，机器人控制和推荐系统的方法的评估表明，在具有较大状态/动作空间的离散/连续域中，它的表现明显优于最先进的模型，从而确认其功效。

While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we establish the theoretical foundations for the validity of training a reinforcement learning agent using embedded states and actions. We then propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that leverages these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming, robotic control, and recommender systems show it significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces, thus confirming its efficacy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题