论文标题
优化次级时间的优化套装
Optimizing Offer Sets in Sub-Linear Time
论文作者
论文摘要
在几乎每个在线环境中,个性化和建议现在都被视为核心能力,从媒体平台到电子商务再到社交网络。虽然估计用户偏好的挑战引起了极大的关注,但使用此类偏好为用户构建个性化要约集的操作问题仍然是一个挑战,尤其是在现代环境中,大量项目和毫秒的响应时间要求也意味着甚至列举所有项目都是不可能的。面对这样的设置,现有技术要么是(a)完全没有原则上的理由,要么(b)理论上是合理的,但是工作速度太慢了。 因此,我们提出了一种用于个性化优惠集优化的算法,该算法在及时的项目数量中运行,同时享受统一的性能保证。我们的算法适用于极其一般的问题和用户选择模型,其中包括混合的多项式logit模型作为特殊情况。我们通过利用学习准确的潜在因子模型以及现有的子线性时间近似邻居算法来实现次线性运行时。我们的算法可以完全由数据驱动,依赖于用户的样本,其中“样本”是指公司通常收集的用户交互数据。我们从包括数百万个广告的Outbrain的大量内容发现数据集上评估了我们的方法。结果表明,相对于现有的快速启发式方法,我们的实施确实很快,并且性能提高。
Personalization and recommendations are now accepted as core competencies in just about every online setting, ranging from media platforms to e-commerce to social networks. While the challenge of estimating user preferences has garnered significant attention, the operational problem of using such preferences to construct personalized offer sets to users is still a challenge, particularly in modern settings where a massive number of items and a millisecond response time requirement mean that even enumerating all of the items is impossible. Faced with such settings, existing techniques are either (a) entirely heuristic with no principled justification, or (b) theoretically sound, but simply too slow to work. Thus motivated, we propose an algorithm for personalized offer set optimization that runs in time sub-linear in the number of items while enjoying a uniform performance guarantee. Our algorithm works for an extremely general class of problems and models of user choice that includes the mixed multinomial logit model as a special case. We achieve a sub-linear runtime by leveraging the dimensionality reduction from learning an accurate latent factor model, along with existing sub-linear time approximate near neighbor algorithms. Our algorithm can be entirely data-driven, relying on samples of the user, where a `sample' refers to the user interaction data typically collected by firms. We evaluate our approach on a massive content discovery dataset from Outbrain that includes millions of advertisements. Results show that our implementation indeed runs fast and with increased performance relative to existing fast heuristics.