论文标题
在时间相关的噪声下的因果模仿学习
Causal Imitation Learning under Temporally Correlated Noise
论文作者
论文摘要
我们开发了从策略数据中学习的算法,这些算法因专家行动中的时间相关噪声而破坏的策略数据。当噪声影响记录数据的多个时间步中时,它可能表现为学习者可能锁定的状态与行动之间的虚假相关性,从而导致政策绩效不佳。为了打破这些虚假的相关性,我们应用了计量经济学仪器变量回归(IVR)技术的现代变体,使我们能够恢复基本政策而无需访问交互式专家。特别是,我们提出了两种技术,是一种可以利用模拟器访问的生成模型风味(doubil),以及一种可以完全离线运行的游戏理论风味(残留)。我们发现,我们的两种算法与模拟控制任务上的行为克隆相比有利。
We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.