论文标题

通过时间差异方法进行风险学习

Risk-Averse Learning by Temporal Difference Methods

论文作者

Kose, Umit, Ruszczynski, Andrzej

论文摘要

我们考虑通过通过动态风险度量评估的绩效进行增强学习。我们构建一个预测的规避风险动态编程方程并研究其属性。然后,我们提出了时间差异方法的风险规避对应物,并证明了它们与概率之一的收敛性。我们还就复杂的运输问题进行了经验研究。

We consider reinforcement learning with performance evaluated by a dynamic risk measure. We construct a projected risk-averse dynamic programming equation and study its properties. Then we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源