论文标题

Motiac:实时招标的多目标演员批评

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

论文作者

Zhou, Haolin, Yang, Chaoqi, Gao, Xiaofeng, Chen, Qiong, Liu, Gongshen, Chen, Guihai

论文摘要

在线实时投标(RTB)是一款复杂的拍卖游戏,广告商在发生用户请求时很难为广告印象竞标。考虑到显示成本,投资回报率(ROI)和其他有影响力的关键绩效指标(KPI),大型广告平台试图平衡动态各个目标之间的权衡。为了应对挑战,我们提出了一种基于强化学习(RL)的多目标参与者 - 侵犯算法,名为Motiac,因为它具有各种目标的优化问题。在MOTIAC中,特定于特定的代理商具有不同的目标和观点的异步更新全球网络,从而实现了强大的招标政策。与以前的RL模型不同,所提出的MOTIAC可以同时完成复杂的招标环境中的多目标任务。此外,我们在数学上证明我们的模型将融合到帕累托最优性。最后,从腾讯上进行大规模真实世界的商业数据集的实验,验证了Motiac的有效性与一系列最近的方法

Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源