论文标题
ABC:离线模式模仿学习的对抗性行为克隆
ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning
论文作者
论文摘要
给定专家代理与感兴趣环境的数据集的数据集,提取有效代理政策的可行方法是估计该数据指示的最大似然策略。这种方法通常称为行为克隆(BC)。在这项工作中,我们描述了由于最大似然目标函数而产生的BC的关键缺点。也就是说,当学习者的政策以高斯人的身份代表时,卑诗省就国家条件专家行动分配而言是卑鄙的。为了解决这个问题,我们引入了BC的修改版本,对抗性行为克隆(ABC),该版本通过结合GAN(生成对抗性网络)培训的元素来展示寻求模式的行为。我们在DeepMind Control Suite的料斗上评估了ABC和基于Hopper的域名,并表明它通过自然而然地寻求模式来优于标准BC。
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behavioral cloning (BC). In this work, we describe a key disadvantage of BC that arises due to the maximum likelihood objective function; namely that BC is mean-seeking with respect to the state-conditional expert action distribution when the learner's policy is represented with a Gaussian. To address this issue, we introduce a modified version of BC, Adversarial Behavioral Cloning (ABC), that exhibits mode-seeking behavior by incorporating elements of GAN (generative adversarial network) training. We evaluate ABC on toy domains and a domain based on Hopper from the DeepMind Control suite, and show that it outperforms standard BC by being mode-seeking in nature.