积极的世界模型学习和进步好奇心

论文标题

积极的世界模型学习和进步好奇心

Active World Model Learning with Progress Curiosity

论文作者

Kim, Kuno, Sano, Megumi, De Freitas, Julian, Haber, Nick, Yamins, Daniel

论文摘要

世界模型是世界如何发展的自我保护的预测模型。人类通过奇怪地探索环境，在此过程中获取高带宽感觉输入的紧凑抽象，跨时间范围计划的能力以及对其他代理的行为模式的理解来学习世界模型。在这项工作中，我们研究了如何设计好奇心驱动的主动世界模型学习（AWML）系统。为此，我们构建了一个好奇的代理人建立世界模型的同时，在视觉上探索了3D物理环境，并富含代表性的现实世界代理。我们提出了一个由$γ$ -progress驱动的AWML系统：可扩展有效的学习进度的好奇心信号。我们表明，$γ$ - 过程自然产生了一种探索政策，该政策以平衡的方式引起人们对复杂但可学习的动态的关注，从而克服了“白噪声问题”。结果，我们的$γ$ -progress驱动的控制器的AWML性能要高于配备了最新探索策略（例如随机网络蒸馏和模型分歧）的基线控制器。

World models are self-supervised predictive models of how the world evolves. Humans learn world models by curiously exploring their environment, in the process acquiring compact abstractions of high bandwidth sensory inputs, the ability to plan across long temporal horizons, and an understanding of the behavioral patterns of other agents. In this work, we study how to design such a curiosity-driven Active World Model Learning (AWML) system. To do so, we construct a curious agent building world models while visually exploring a 3D physical environment rich with distillations of representative real-world agents. We propose an AWML system driven by $γ$-Progress: a scalable and effective learning progress-based curiosity signal. We show that $γ$-Progress naturally gives rise to an exploration policy that directs attention to complex but learnable dynamics in a balanced manner, thus overcoming the "white noise problem". As a result, our $γ$-Progress-driven controller achieves significantly higher AWML performance than baseline controllers equipped with state-of-the-art exploration strategies such as Random Network Distillation and Model Disagreement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题