Motiac：实时招标的多目标演员批评

论文标题

Motiac：实时招标的多目标演员批评

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

论文作者

Zhou, Haolin, Yang, Chaoqi, Gao, Xiaofeng, Chen, Qiong, Liu, Gongshen, Chen, Guihai

论文摘要

在线实时投标（RTB）是一款复杂的拍卖游戏，广告商在发生用户请求时很难为广告印象竞标。考虑到显示成本，投资回报率（ROI）和其他有影响力的关键绩效指标（KPI），大型广告平台试图平衡动态各个目标之间的权衡。为了应对挑战，我们提出了一种基于强化学习（RL）的多目标参与者 - 侵犯算法，名为Motiac，因为它具有各种目标的优化问题。在MOTIAC中，特定于特定的代理商具有不同的目标和观点的异步更新全球网络，从而实现了强大的招标政策。与以前的RL模型不同，所提出的MOTIAC可以同时完成复杂的招标环境中的多目标任务。此外，我们在数学上证明我们的模型将融合到帕累托最优性。最后，从腾讯上进行大规模真实世界的商业数据集的实验，验证了Motiac的有效性与一系列最近的方法

Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches

下载PDF全文

下载文献需遵守相关版权规定

论文标题