论文标题

多授权援助游戏:定义和大学机制

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

论文作者

Fickinger, Arnaud, Zhuang, Simon, Critch, Andrew, Hadfield-Menell, Dylan, Russell, Stuart

论文摘要

我们通过使用足够的大学偏好推理机制来介绍多宗教援助游戏(MPAG)的概念,并规避社会选择理论吉布巴德定理的障碍。在MPAG中,单个代理有助于可能具有截然不同的偏好的人类原则。 MPAG概括援助游戏,也称为合作逆增强学习游戏。我们特别分析了学徒学习的概括,其中人类首先执行一些工作以获得效用并证明其偏好,然后机器人采取行动以进一步最大化人类收益的总和。在这种情况下,我们表明,如果游戏足够合作,即,如果人类负责通过自己的行动获得足够的奖励,那么他们的偏好会通过他们的工作直接揭示。这种启示机制是非讽刺的,不会将可能的结果限制为两个替代方案,并且是占主导地位的激励兼容。

We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism. In an MPAG, a single agent assists N human principals who may have widely different preferences. MPAGs generalize assistance games, also known as cooperative inverse reinforcement learning games. We analyze in particular a generalization of apprenticeship learning in which the humans first perform some work to obtain utility and demonstrate their preferences, and then the robot acts to further maximize the sum of human payoffs. We show in this setting that if the game is sufficiently collegial, i.e. if the humans are responsible for obtaining a sufficient fraction of the rewards through their own actions, then their preferences are straightforwardly revealed through their work. This revelation mechanism is non-dictatorial, does not limit the possible outcomes to two alternatives, and is dominant-strategy incentive-compatible.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源