最佳欺骗Stackelberg游戏中的学习领导者

论文标题

最佳欺骗Stackelberg游戏中的学习领导者

Optimally Deceiving a Learning Leader in Stackelberg Games

论文作者

Birmpas, Georgios, Gan, Jiarui, Hollender, Alexandros, Marmolejo-Cossío, Francisco J., Rajgopal, Ninad, Voudouris, Alexandros A.

论文摘要

ML社区的最新结果表明，用于计算领导者在Stackelberg游戏中承诺的最佳策略的学习算法很容易受到追随者的操纵。这样的学习算法是通过查询最佳响应或追随者的回报来运作的，因此，他们可以通过回应他的收益与实际情况大不相同，从而欺骗算法。为了使这种战略行为取得成功，追随者面临的主要挑战是指出将使学习算法计算承诺的回报，以便最大程度地回应其最大化，从而最大程度地提高了追随者的效用。虽然以前已经考虑过此问题，但相关文献仅着眼于回报空间有限的简化场景，因此使问题的一般版本未解决。在本文中，我们通过证明追随者始终可以计算有关领导者和追随者之间学习互动的各种方案的最佳收益来填补这一空白。

Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near-)optimal payoffs for various scenarios about the learning interaction between leader and follower.

下载PDF全文

下载文献需遵守相关版权规定

论文标题