论文标题

可及性感的拉普拉斯(Laplacian)在加强学习中的代表

Reachability-Aware Laplacian Representation in Reinforcement Learning

论文作者

Wang, Kaixin, Zhou, Kuangqi, Feng, Jiashi, Hooi, Bryan, Wang, Xinchao

论文摘要

在强化学习(RL)中,拉普拉斯表示(Laprep)是一个任务无义的状态表示,它编码环境的几何形状。先前作品中指出的Laprep的理想特性是,Laprep空间中的欧几里得距离大致反映了状态之间的可达性,这激发了此距离的使用以进行奖励成型。但是,我们发现Laprep一般不一定具有此属性:在环境中,在Laprep下距离较小的两个州实际上可能很远。这种不匹配会阻碍学习过程来奖励成型。为了解决此问题,我们通过正确缩放Laprep的每个维度来介绍可及性感的Laplacian表示(RA-LAPREP)。尽管很简单,但我们证明了Ra-laprep可以通过理论解释和实验结果更好地捕获与Laprep相比的州间可达性。此外,我们表明这种改进可以显着提高奖励成型性能,并使瓶颈状态发现受益。

In Reinforcement Learning (RL), Laplacian Representation (LapRep) is a task-agnostic state representation that encodes the geometry of the environment. A desirable property of LapRep stated in prior works is that the Euclidean distance in the LapRep space roughly reflects the reachability between states, which motivates the usage of this distance for reward shaping. However, we find that LapRep does not necessarily have this property in general: two states having small distance under LapRep can actually be far away in the environment. Such mismatch would impede the learning process in reward shaping. To fix this issue, we introduce a Reachability-Aware Laplacian Representation (RA-LapRep), by properly scaling each dimension of LapRep. Despite the simplicity, we demonstrate that RA-LapRep can better capture the inter-state reachability as compared to LapRep, through both theoretical explanations and experimental results. Additionally, we show that this improvement yields a significant boost in reward shaping performance and also benefits bottleneck state discovery.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源