策略梯度RL算法按照指示无环图

论文标题

策略梯度RL算法按照指示无环图

Policy Gradient RL Algorithms as Directed Acyclic Graphs

论文作者

Luis, Juan Jose Garau

论文摘要

元加强学习（RL）方法的重点是自动化概括为广泛环境的RL算法的设计。（匿名，2020年）中引入的框架通过指示的无环图（DAG）表示不同的RL算法，并使用进化的元学习器来修改这些图形并找到良好的代理更新规则，从而解决了问题。虽然用来在论文中生成图形的搜索语言用来表示已经存在的许多已经存在的RL算法（例如DQN，DDQN），但在表示策略梯度算法时，它具有限制。在这项工作中，我们试图通过扩展原始搜索语言并为五种不同的策略梯度算法提出图表来缩小这一差距：VPG，PPO，DDPG，TD3和SAC。

Meta Reinforcement Learning (RL) methods focus on automating the design of RL algorithms that generalize to a wide range of environments. The framework introduced in (Anonymous, 2020) addresses the problem by representing different RL algorithms as Directed Acyclic Graphs (DAGs), and using an evolutionary meta learner to modify these graphs and find good agent update rules. While the search language used to generate graphs in the paper serves to represent numerous already-existing RL algorithms (e.g., DQN, DDQN), it has limitations when it comes to representing Policy Gradient algorithms. In this work we try to close this gap by extending the original search language and proposing graphs for five different Policy Gradient algorithms: VPG, PPO, DDPG, TD3, and SAC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题