论文标题
无限 - 摩恩预测的生成时间差异学习
Generative Temporal Difference Learning for Infinite-Horizon Prediction
论文作者
论文摘要
我们介绍了$γ$ - 模型,这是一种具有无限概率范围的环境动力学的预测模型。用$γ$模型替换标准的单步型模型会导致对基于模型控制的过程的概括,包括模型推出和基于模型的价值估计。 $γ$ - 模型是对时间差学习的生成重新解释训练的,是后继表示的自然连续类似物,也是基于模型和基于模型的机制之间的混合体。像价值功能一样,它包含有关长期未来的信息;像标准预测模型一样,它独立于任务奖励。我们将$γ$模型的实例化为生成的对抗网络,又是使流量正常化的,讨论其训练如何反映培训时间和测试时间复合错误之间不可避免的权衡,并经验研究了其预测和控制的实用性。
We introduce the $γ$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with $γ$-models leads to generalizations of the procedures central to model-based control, including the model rollout and model-based value estimation. The $γ$-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the $γ$-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.