论文标题

双曲线深钢筋学习

Hyperbolic Deep Reinforcement Learning

论文作者

Cetin, Edoardo, Chamberlain, Benjamin, Bronstein, Michael, Hunt, Jonathan J

论文摘要

我们提出了一类新的深钢筋学习(RL)算法,该算法对双曲线空间中的潜在表示进行了建模。顺序决策需要推理当前行为的未来后果。因此,捕获给定任务的关键发展特征之间的关系有利于恢复有效的政策。为此,双曲几何形状提供了具有自然基础的深度RL模型,以精确编码此本质上的分层信息。但是,通过双曲线深度学习文献中现有方法论导致致命优化不稳定性,这是由于表征RL梯度估计器的非平稳性和差异。因此,我们设计了一种新的通用方法,可以抵消这种优化挑战,并通过深度双曲线表示稳定的端到端学习。我们通过将其应用于Procgen和Atari 100K基准的流行式和政策的RL算法来验证我们的框架,从而获得了几乎普遍的性能和概括效益。鉴于其自然拟合,我们希望未来的RL研究将双曲线表示为标准工具。

We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源