论文标题

CIC:无监督技能发现的对比固有控制

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

论文作者

Laskin, Michael, Liu, Hao, Peng, Xue Bin, Yarats, Denis, Rajeswaran, Aravind, Abbeel, Pieter

论文摘要

我们引入了对比性内在控制(CIC),这是一种无监督技能发现的算法,可最大程度地提高州 - 转变和潜在技能向量之间的相互信息。 CIC利用国家转变和技能之间的对比学习来学习行为嵌入,并最大化这些嵌入的熵,以作为鼓励行为多样性的内在奖励。我们在无监督的强化学习基准上评估了算法,该算法包括一个长期的无奖励预训练阶段,然后是短暂的适应阶段,可以通过外部奖励进行下游任务。 CIC在适应效率方面显着改善了先前方法,从而超过了1.79倍的先前无监督技能发现方法,而下一个领先的总体勘探算法则优于1.18倍。

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. CIC substantially improves over prior methods in terms of adaptation efficiency, outperforming prior unsupervised skill discovery methods by 1.79x and the next leading overall exploration algorithm by 1.18x.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源