正规化的遗体损失，分配加固学习

论文标题

正规化的遗体损失，分配加固学习

Distributional Reinforcement Learning with Regularized Wasserstein Loss

论文作者

Sun, Ke, Zhao, Yingnan, Liu, Wulong, Jiang, Bei, Kong, Linglong

论文摘要

分配加固学习（RL）的经验成功高度依赖于配备适当分布表示的分布差异的选择。在本文中，我们提出了\ textit {sindhorn分布Rl（sindhorndrl）}，该}利用正规化的瓦斯汀损失来降低当前和目标贝尔曼回报分布之间的差异。从理论上讲，我们证明了Sindhorndrl的收缩特性，与Wasserstein距离和最大平均差异（MMD）之间的sindhorn差异的插值对齐。引入的sindhorndrl丰富了分布RL算法的家族，与我们对其关系的研究相比，与现有方法相比，解释了算法行为。从经验上讲，我们表明Sindhorndrl始终优于Atari Games Suite上现有的算法，尤其是在多维奖励设置中脱颖而出。 \谢谢{代码可在\ url {https://github.com/datake/sinkhorndistrl}。}中提供。}。

The empirical success of distributional reinforcement learning (RL) highly relies on the choice of distribution divergence equipped with an appropriate distribution representation. In this paper, we propose \textit{Sinkhorn distributional RL (SinkhornDRL)}, which leverages Sinkhorn divergence, a regularized Wasserstein loss, to minimize the difference between current and target Bellman return distributions. Theoretically, we prove the contraction properties of SinkhornDRL, aligning with the interpolation nature of Sinkhorn divergence between Wasserstein distance and Maximum Mean Discrepancy (MMD). The introduced SinkhornDRL enriches the family of distributional RL algorithms, contributing to interpreting the algorithm behaviors compared with existing approaches by our investigation into their relationships. Empirically, we show that SinkhornDRL consistently outperforms or matches existing algorithms on the Atari games suite and particularly stands out in the multi-dimensional reward setting. \thanks{Code is available in \url{https://github.com/datake/SinkhornDistRL}.}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题