论文标题
在已知和未知环境中的随机最短路径问题的凸双偶性
Convex duality for stochastic shortest path problems in known and unknown environments
论文作者
论文摘要
本文从凸优化的角度研究了已知和未知环境中的随机最短路径(SSP)问题。它首先回忆起已知参数案例的结果,并通过不同的证据发展理解。然后,它专注于未知参数案例,其中研究了扩展价值迭代(EVI)运算符。这包括Rosenberg等人中使用的现有操作员。 [26]和Tarbouriech等。 [31]基于L-1规范和至上规范,以及定义与其他规范和差异相对应的EVI操作员,例如KL-Divergence。本文总的来说,EVI操作员如何与凸面程序以及其双重形式的形式相关联,这些形式表现出强大的双重性。 然后,本文重点介绍了NEU和Pike-Burke [21]的有限视野研究的界限是否可以应用于SSP设置中的这些扩展价值迭代操作员。它表明,对于这些运算符而言,与[21]的相似界限相似,但是它们导致运算符在一般单调且具有更复杂的收敛属性。在特殊情况下,我们观察到振荡行为。本文通过几个需要进一步检查的示例,就研究的进展产生了公开问题。
This paper studies Stochastic Shortest Path (SSP) problems in known and unknown environments from the perspective of convex optimisation. It first recalls results in the known parameter case, and develops understanding through different proofs. It then focuses on the unknown parameter case, where it studies extended value iteration (EVI) operators. This includes the existing operators used in Rosenberg et al. [26] and Tarbouriech et al. [31] based on the l-1 norm and supremum norm, as well as defining EVI operators corresponding to other norms and divergences, such as the KL-divergence. This paper shows in general how the EVI operators relate to convex programs, and the form of their dual, where strong duality is exhibited. This paper then focuses on whether the bounds from finite horizon research of Neu and Pike-Burke [21] can be applied to these extended value iteration operators in the SSP setting. It shows that similar bounds to [21] for these operators exist, however they lead to operators that are not in general monotone and have more complex convergence properties. In a special case we observe oscillating behaviour. This paper generates open questions on how research may progress, with several examples that require further examination.