论文标题
顺序信息设计:学会说服在黑暗中
Sequential Information Design: Learning to Persuade in the Dark
论文作者
论文摘要
我们研究了一个知情的发件人面临的重复信息设计问题,该问题试图影响自私接收者的行为。我们考虑接收器面临顺序决策(SDM)问题的设置。在每个回合中,发件人都会观察SDM问题中随机事件的实现。这会面临如何逐步向接收者披露此类信息以说服他们遵循(理想的)行动建议的挑战。我们研究了发件人不知道随机事件概率的情况,因此,他们必须在说服接收器时逐渐学习它们。首先,我们提供了发件人有说服力的信息结构集的非平凡的多面近似。这对于设计有效的学习算法至关重要。接下来,我们证明了一个负面的结果:没有学习算法可以说服力。因此,我们通过专注于确保接收者对以下建议的遗憾的算法来放松说服力要求。在全反馈设置(发送者观察所有随机事件实现)中,我们为发送者和接收器提供了$ \ tilde {o}(\ sqrt {t})$遗憾的算法。取而代之的是,在bandit反馈设置中 - 发件人仅观察SDM问题中实际发生的随机事件的实现 - 我们设计了一种算法,在[1/2,1] $中,该算法是输入中的$α\,确保$ \ tilde {o}(o}(tilde {t^α}} $和$ \ tilde, 1- \fracα{2} \}})$遗憾,分别为发送者和接收器。该结果补充了下限,表明这种遗憾的权衡本质上是紧张的。
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $α\in [1/2, 1]$ as input, ensures $\tilde{O}({T^α})$ and $\tilde{O}( T^{\max \{ α, 1-\fracα{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight.