论文标题
理解自我预测性学习以进行强化学习
Understanding Self-Predictive Learning for Reinforcement Learning
论文作者
论文摘要
我们研究了增强学习的自我预测学习的学习动力,这是一个算法家庭,通过最大程度地减少其未来潜伏表示的预测错误来学习表示形式。尽管最近获得了经验成功,但这种算法具有明显的缺陷:微不足道的表示(例如常数)最大程度地减少了预测误差,但显然不希望收敛到这种解决方案。我们的主要见解是,仔细设计优化动力学对于学习有意义的表示至关重要。我们确定对表示变量和半差的更新更快的节奏优化对于防止表示形式崩溃至关重要。然后,在理想化的设置中,我们显示自我预测的学习动力学对状态过渡矩阵进行光谱分解,从而有效地捕获了过渡动力学的信息。在理论见解的基础上,我们提出了双向自我预测学习,这是一种新颖的自我预测算法,同时学习两种表示。我们通过许多小规模的实验研究了理论见解的鲁棒性,并通过大规模实验展示了新的表示学习算法的希望。
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.