论文标题
通过压缩学习选项
Learning Options via Compression
论文作者
论文摘要
在多任务加强学习中确定解决方案中某些任务的统计规律性可以加速新任务的学习。技能学习提供了一种方法,可以通过将预采用的体验分解为一系列技能来识别这些规律。一种流行的技能学习方法是最大程度地利用潜在变量代表技能的潜在变量模型的预先收集的体验的可能性。但是,通常有许多解决方案可以很好地提高可能性,包括退化解决方案。为了解决这个规定的规定,我们提出了一个新的目标,将最大似然目标与对技能的描述长度相结合。这种惩罚激发了从经验中最大程度地提取共同结构的技能。从经验上讲,我们的目标学习技能,这些技能与仅从最大可能性中学到的技能相比,在较少的样本中解决了下游任务。此外,尽管大多数先前的脱机多任务设置都集中在具有低维度观察的任务上,但我们的目标可以扩展到具有高维图像观察的挑战性任务。
Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.