通过优势学习，在一般tsallis熵加强学习中执行KL正则化

论文标题

通过优势学习，在一般tsallis熵加强学习中执行KL正则化

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

论文作者

Zhu, Lingwei, Chen, Zheng, Uchibe, Eiji, Matsubara, Takamitsu

论文摘要

加固学习中的最大Tsallis熵（MTE）框架最近由于其灵活的建模选择，包括广泛使用的香农熵和稀疏熵。但是，非香农熵由于其灵敏度或缺乏封闭形式的策略表达而遭受近似误差和随后的表现不佳。为了提高灵活性和经验性能之间的权衡，我们建议通过在MUNCHAUSEN DQN（MDQN）激发的MTE中执行隐式Kullback-Leibler（KL）正则化来增强其误解性。我们通过在MDQN和Advantage学习之间进行连接来做到这一点，通过该连接，MDQN被证明未能推广到MTE框架。在广泛的实验上验证了所提出的TSALLIS优势学习（TAL），不仅可以显着改善Tsallis-DQN的各种非锁定形式的Tsallis熵，而且还表现出与最先进的最大Shannon熵算法相当的性能。

Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form Tsallis entropies, but also exhibits comparable performance to state-of-the-art maximum Shannon entropy algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题