论文标题
基于批判性的变化分数算法用于增强学习
Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning
论文作者
论文摘要
在强化学习的背景下,我们介绍了一个国家的批判性概念,这表明该特定状态的行动选择在多大程度上影响了预期的回报。也就是说,采取行动选择更有可能影响最终结果的状态被认为比影响最终结果的可能性较小的状态更为关键。 我们制定了一种基于临界性的变化步骤数算法(CVS) - 一种灵活的步长算法,该算法利用人类提供的关键性函数或直接从环境中学到的关键函数。我们在三个不同的领域进行测试,包括Atari Pong环境,道路树环境和射手环境。我们证明,CVS能够胜过大众学习算法,例如深Q学习和Monte Carlo。
In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That is, a state in which the choice of action is more likely to influence the final outcome is considered as more critical than a state in which it is less likely to influence the final outcome. We formulate a criticality-based varying step number algorithm (CVS) - a flexible step number algorithm that utilizes the criticality function provided by a human, or learned directly from the environment. We test it in three different domains including the Atari Pong environment, Road-Tree environment, and Shooter environment. We demonstrate that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.