论文标题
通过插值项目位置模型和基于位置的模型,用于学习级别的非政策评估
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
论文作者
论文摘要
对工业推荐系统的关键需求是在将其部署到生产之前,能够离线评估建议策略。不幸的是,广泛使用的非政策评估方法要么对用户的行为如何导致过度偏见做出了强有力的假设,要么做出更少的假设并遭受了较大的差异。我们通过开发一个新的估计器来解决这个问题,该估计值可以减轻两个最受欢迎的排名估算值的问题,即基于位置的模型和项目位置模型。特别是,称为Interpol的新估计器解决了潜在的基于位置的误解的模型的偏见,同时与项目位置模型相比提供了适应性的偏置差异权衡。我们提供理论论点和经验结果,以突出我们新颖的估计方法的表现。
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. Unfortunately, widely used off-policy evaluation methods either make strong assumptions about how users behave that can lead to excessive bias, or they make fewer assumptions and suffer from large variance. We tackle this problem by developing a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings, namely the position-based model and the item-position model. In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model, while providing an adaptable bias-variance trade-off compared to the item-position model. We provide theoretical arguments as well as empirical results that highlight the performance of our novel estimation approach.