对比UCB：在线增强学习中，可证明有效的对比度自我监督学习

论文标题

对比UCB：在线增强学习中，可证明有效的对比度自我监督学习

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

论文作者

Qiu, Shuang, Wang, Lingxiao, Bai, Chenjia, Yang, Zhuoran, Wang, Zhaoran

论文摘要

鉴于它在提取功能表示方面的能力，对比性的自我监督学习已成功地融入了（深）强化学习（RL）的实践中，从而在各种应用程序中提供了有效的政策学习。尽管取得了巨大的经验成功，但对RL的对比学习的理解仍然难以捉摸。为了缩小这样的差距，我们研究了Markov决策过程（MDP）和Markov Games（MGS）的对比度学习如何赋予RL的能力。对于这两种模型，我们建议通过最大程度地减少对比度损失来提取低级别模型的正确特征表示。此外，在在线环境下，我们提出了新颖的上置信度约束（UCB）型算法，该算法将这种对比度损失与MDP或MGS的在线RL算法结合在一起。从理论上讲，我们进一步证明了我们的算法恢复了真实表示形式，并同时在学习MDP和MGS中学习最佳策略和NASH平衡方面同时实现了样本效率。我们还提供实证研究，以证明基于UCB的RL学习方法的功效。据我们所知，我们提供了第一种可证明有效的在线RL算法，该算法结合了代表学习的对比学习。我们的代码可从https://github.com/baichenjia/contrastive-ucb获得。

In view of its power in extracting feature representation, contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL), leading to efficient policy learning in various applications. Despite its tremendous empirical successes, the understanding of contrastive learning for RL remains elusive. To narrow such a gap, we study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss. Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs. We further theoretically prove that our algorithm recovers the true representations and simultaneously achieves sample efficiency in learning the optimal policy and Nash equilibrium in MDPs and MGs. We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL. To the best of our knowledge, we provide the first provably efficient online RL algorithm that incorporates contrastive learning for representation learning. Our codes are available at https://github.com/Baichenjia/Contrastive-UCB.

下载PDF全文

下载文献需遵守相关版权规定

论文标题