分散匹配市场中的匪徒学习

论文标题

分散匹配市场中的匪徒学习

Bandit Learning in Decentralized Matching Markets

论文作者

Liu, Lydia T., Ruan, Feng, Mania, Horia, Jordan, Michael I.

论文摘要

我们研究了双面匹配市场，在该市场中，市场的一方（参与者）对其对另一侧（武器）的偏好没有先验知识，并且需要从经验中学习其偏好。另外，我们认为玩家没有直接的交流手段。该模型将标准的随机多臂强盗框架扩展到具有竞争的分散多个玩家设置。我们为此设置介绍了一种新的算法，该算法在一个时间范围内$ \ Mathcal {o}（\ log（t））$稳定的遗憾是分享武器的偏好而不是$ \ Mathcal {o}（\ log（t）^2）$遗憾。此外，在单个播放器可能会偏离的情况下，我们表明，只要共享武器的偏好，算法都是激励兼容的，但在偏好完全一般时不一定如此。

We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon $T$, attains $\mathcal{O}(\log(T))$ stable regret when preferences of the arms over players are shared, and $\mathcal{O}(\log(T)^2)$ regret when there are no assumptions on the preferences on either side. Moreover, in the setting where a single player may deviate, we show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.

下载PDF全文

下载文献需遵守相关版权规定

论文标题