基于随机平均收益游戏和熵游戏的价值迭代的普遍复杂性界限

论文标题

基于随机平均收益游戏和熵游戏的价值迭代的普遍复杂性界限

Universal Complexity Bounds Based on Value Iteration for Stochastic Mean Payoff Games and Entropy Games

论文作者

Allamigeon, Xavier, Gaubert, Stéphane, Katz, Ricardo D., Skomra, Mateusz

论文摘要

我们开发基于价值迭代的算法，以统一的方式解决不同类别的零和零类型游戏的类型奖励。这些算法依赖于甲骨文，将动态编程操作员评估为给定精度。 We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and super-eigenvectors of the dynamic programming operator.我们通过两个应用程序说明了此方法。第一个是一个新的证据，导致了Boros，Elbassioni，Gurvich和Makino定理的提高复杂性估计值，这表明可以在伪多项式时间内解决具有固定随机位置的基于转弯的平均付款游戏。第二个涉及熵游戏，这是Asarin，Cervelle，Degorre，Dima，Horn和Kozyakin引入的模型。熵游戏的排名被定义为由两个玩家策略确定的所有歧义矩阵中的最大排名。我们表明，具有固定排名的熵游戏可以在多项式时间内解决，并且可以在相同的固定排名条件下以伪多项式时间在伪多项式时间中求解熵游戏的扩展。

We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and super-eigenvectors of the dynamic programming operator. We illustrate this method by two applications. The first one is a new proof, leading to improved complexity estimates, of a theorem of Boros, Elbassioni, Gurvich and Makino, showing that turn-based mean-payoff games with a fixed number of random positions can be solved in pseudo-polynomial time. The second one concerns entropy games, a model introduced by Asarin, Cervelle, Degorre, Dima, Horn and Kozyakin. The rank of an entropy game is defined as the maximal rank among all the ambiguity matrices determined by strategies of the two players. We show that entropy games with a fixed rank, in their original formulation, can be solved in polynomial time, and that an extension of entropy games incorporating weights can be solved in pseudo-polynomial time under the same fixed rank condition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题