论文标题
激励探索和对付款的上下文匪徒的建议
Incentivising Exploration and Recommendations for Contextual Bandits with Payments
论文作者
论文摘要
我们提出了一个基于上下文的强盗模型,以在近视用户的存在下捕获Web平台的学习和社会福利目标。通过使用付款来激励这些代理来探索不同的项目/建议,我们展示了该平台如何学习项目的固有属性并获得统一的遗憾,同时最大程度地提高累积社会福利。我们还计算了激励到平台的累积成本的理论界限。与该领域中的以前的作品不同,我们认为上下文是完全对手的,而对手的行为是平台未知的。我们的方法可以改善电子商务商店中用户的各种参与度指标,推荐引擎和匹配平台。
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.