鲁棒值函数的几何形状

论文标题

鲁棒值函数的几何形状

The Geometry of Robust Value Functions

论文作者

Wang, Kaixin, Kumar, Navdeep, Zhou, Kuangqi, Hooi, Bryan, Feng, Jiashi, Mannor, Shie

论文摘要

价值功能的空间是强化学习中的一个基本概念。表征其几何特性可以提供优化和表示的见解。现有作品主要关注马尔可夫决策过程（MDP）的价值空间。在本文中，我们研究了更通用的鲁棒MDP（RMDPS）设置的稳健价值空间的几何形状，其中考虑了过渡不确定性。具体而言，由于我们发现很难直接适应RMDP的先验方法，因此我们从重新审视非舒适案例开始，并引入了一种新的视角，使我们能够以类似的方式表征非稳定和健壮的价值空间。这种观点的关键是将价值空间以州的方式分解成超曲面的工会。通过我们的分析，我们表明稳健的价值空间由一组圆锥形的超曲面确定，每组都包含所有在一个状态上一致的策略的可靠值。此外，我们发现在不确定性集中仅采用极端点足以确定可靠的值空间。最后，我们讨论了有关强大价值空间的其他一些方面，包括其对多个州的非跨性别和政策协议。

The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov Decision Processes (MDPs). In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncertainties are considered. Specifically, since we find it hard to directly adapt prior approaches to RMDPs, we start with revisiting the non-robust case, and introduce a new perspective that enables us to characterize both the non-robust and robust value space in a similar fashion. The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces. Through our analysis, we show that the robust value space is determined by a set of conic hypersurfaces, each of which contains the robust values of all policies that agree on one state. Furthermore, we find that taking only extreme points in the uncertainty set is sufficient to determine the robust value space. Finally, we discuss some other aspects about the robust value space, including its non-convexity and policy agreement on multiple states.

下载PDF全文

下载文献需遵守相关版权规定

论文标题