论文标题
在重复的游戏中,上一轮收敛和不变的遗憾,没有非对称信息
Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information
论文作者
论文摘要
本文考虑了重复的游戏,其中一个玩家比其他玩家拥有更多有关游戏的信息。特别是,我们调查了重复的两人零和游戏游戏,其中只有专栏播放器知道游戏的回报矩阵A。假设在反复玩这款游戏时,该行玩家通过使用无regret算法来最大程度地减少她(伪)的遗憾,在每轮比赛中选择了她的策略。我们为列播放器开发了一种无恒定的regret算法,以展示上一轮收敛到最小值平衡。我们表明,对于ROW播放器的大量流行的No-Regret算法,我们的算法是有效的,包括多重重量更新算法,在线镜像下降方法/关注型调查领导者,线性多重重量更新算法以及优化的多重重量更新。
This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update.