论文标题
反事实学习的加速收敛
Accelerated Convergence for Counterfactual Learning to Rank
论文作者
论文摘要
反事实学习排名(LTR)算法从经常使用生产系统收集的记录用户交互中学习排名模型。与在线学习方法相比,采用这种离线学习方法有很多好处,但是它具有挑战性,因为用户反馈通常含有较高的偏见。公正的LTR使用反向倾向评分(IP)来从记录的用户交互中启用无偏学习。将随机梯度下降(SGD)方法应用于反事实学习问题的主要困难之一是倾向权重引入的巨大差异。在本文中,我们表明,IPS加权梯度的SGD方法的收敛速率受IPS权重引入的较大差异:收敛速度很慢,尤其是当IPS重量较大时。为了克服这一局限性,我们提出了一种新型的学习算法,称为countersplame,该算法比标准IPS加权梯度下降方法更好地收敛。我们证明,通过在许多偏见的LTR场景中进行广泛的实验 - 跨优化器,批次大小和不同程度的位置偏差,对我们的理论发现更快并补充了我们的理论结果。
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.