使用离线强化学习对通知进行多目标优化

论文标题

使用离线强化学习对通知进行多目标优化

Multi-objective Optimization of Notifications Using Offline Reinforcement Learning

论文作者

Prabhakar, Prakruthi, Yuan, Yiping, Yang, Guangyu, Sun, Wensheng, Muralidharan, Ajith

论文摘要

移动通知系统在各种应用程序中起着重要作用，以通信，向用户发送警报和提醒，以告知他们有关新闻，事件或消息的信息。在本文中，我们将近实时的通知决策问题制定为马尔可夫决策过程，在该过程中，我们对奖励中的多个目标进行了优化。我们提出了一个端到端的离线增强学习框架，以优化顺序的通知决策。我们使用基于保守的Q学习的双重Q网络方法来应对离线学习的挑战，从而减轻了分配转移问题和Q值高估。我们说明了完全部署的系统，并通过离线和在线实验证明了拟议方法的性能和好处。

Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Q-network method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题