端到端的学习和在游戏中的干预

论文标题

端到端的学习和在游戏中的干预

End-to-End Learning and Intervention in Games

论文作者

Li, Jiayang, Yu, Jing, Nie, Yu Marco, Wang, Zhaoran

论文摘要

在社会体系中，代理人的自我利益可能对集体利益有害，有时会导致社会困境。为了解决这种冲突，中央设计师可以通过重新设计系统或激励代理来改变其行为来进行干预。为了有效，设计师必须预测代理人对干预的反应，这是由他们通常未知的收益功能决定的。因此，了解代理是干预的先决条件。在本文中，我们为学习和干预游戏提供了一个统一的框架。我们将游戏的平衡作为各个层次，并将它们集成到端到端的优化框架中。为了通过游戏的平衡使向后传播，我们分别基于明确和隐性的区分提出了两种方法。具体而言，我们将平衡作为变异不等式的解决方案（VIS）。明确的方法展开了解决VIS的投影方法，而隐式方法利用了解决方案的敏感性。两种方法的核心都是通过投影算子的分化。此外，我们确定了两种方法的正确性，并确定一种方法比另一种方法更可取的条件。使用几个现实世界问题对分析结果进行了验证。

In a social system, the self-interest of agents can be detrimental to the collective good, sometimes leading to social dilemmas. To resolve such a conflict, a central designer may intervene by either redesigning the system or incentivizing the agents to change their behaviors. To be effective, the designer must anticipate how the agents react to the intervention, which is dictated by their often unknown payoff functions. Therefore, learning about the agents is a prerequisite for intervention. In this paper, we provide a unified framework for learning and intervention in games. We cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework. To enable the backward propagation through the equilibria of games, we propose two approaches, respectively based on explicit and implicit differentiation. Specifically, we cast the equilibria as the solutions to variational inequalities (VIs). The explicit approach unrolls the projection method for solving VIs, while the implicit approach exploits the sensitivity of the solutions to VIs. At the core of both approaches is the differentiation through a projection operator. Moreover, we establish the correctness of both approaches and identify the conditions under which one approach is more desirable than the other. The analytical results are validated using several real-world problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题