统一在线和反事实学习排名

论文标题

统一在线和反事实学习排名

Unifying Online and Counterfactual Learning to Rank

论文作者

Oosterhuis, Harrie, de Rijke, Maarten

论文摘要

基于用户交互的优化排名系统是一个充分研究的问题。基于用户互动的优化排名系统的最新方法分为在线方法 - 通过与用户直接互动（以及反事实方法）学习，这些方法从历史交互中学习。在没有在线干预的情况下，会阻碍现有的在线方法，因此不应反而对其进行应用。相反，反事实方法不能直接从在线干预中受益。我们为反事实和在线学习（LTR）提出了一种新颖的干预估计量。随着干预感知器的引入，我们旨在桥接在线/反事实LTR部门，因为它在在线和反事实场景中都非常有效。估算器通过根据记录策略的行为和在线干预措施使用更正来纠正位置偏见，信任偏见和项目选择偏置的影响：在收集点击数据期间所做的记录策略的更改。我们在半合成实验设置中进行的实验结果表明，与现有的反事实LTR方法不同，干预感知的估计器可以从在线干预措施中受益匪浅。

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions. We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题