在拥塞游戏中学习强盗反馈

论文标题

在拥塞游戏中学习强盗反馈

Learning in Congestion Games with Bandit Feedback

论文作者

Cui, Qiwen, Xiong, Zhihan, Fazel, Maryam, Du, Simon S.

论文摘要

在本文中，我们研究了纳什regret在拥堵游戏中的最小化，这是一类具有良性理论结构和广泛现实世界应用的游戏。我们首先提出了一种基于（半）强盗反馈的交通拥堵游戏的不确定性原则的乐观，提出了一种集中式算法，并获得了有限的样本保证。然后，我们通过Frank-Wolfe方法和G-Optimal Design的新型组合提出了一种分散的算法。通过利用拥塞游戏的结构，我们显示了两种算法的样本复杂性仅取决于多项式的玩家数量和设施的数量，但不能在操作集的大小上，这在设施的数量上可以指数较大。我们进一步定义了一个新的问题类Markov交通拥堵游戏，这使我们能够在拥塞游戏中对非平稳性进行建模。我们为马尔可夫拥堵游戏提出了一种集中式算法，其样本复杂性再次对所有相关问题参数具有多项性依赖性，而不是操作集的大小。

In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题