从隐式用户反馈中在线学习神经排名模型

论文标题

从隐式用户反馈中在线学习神经排名模型

Learning Neural Ranking Models Online from Implicit User Feedback

论文作者

Jia, Yiling, Wang, Hongning

论文摘要

现有的在线学习排名（OL2R）解决方案仅限于线性模型，这对于捕获查询和文档之间可能的非线性关系是无能的。在这项工作中，为了释放OL2R中的表示能力，我们建议直接从用户的隐式反馈（例如，点击）中收集的神经排名模型。由于它们在离线环境中的巨大经验成功和广泛的采用，我们专注于Ranknet和Lambdarank，并基于使用神经切线核对神经网络的融合分析来控制臭名昭著的探索探索折衷。具体而言，在每一轮结果中，探索仅在文档对上进行，其中两个文档之间的预测等级顺序不确定；否则，排名在结果排名中将遵循排名者的预测顺序。我们证明，根据标准假设，我们的OL2R解决方案实现了$ o（\ log^2（t））$的间隙依赖性上后悔，其中在$ t $ rounds上定义了遗憾的遗憾。与两个公共学习中的一系列最新的OL2R基线相比，对基准数据集进行排名的比较证明了该解决方案的有效性。

Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents. In this work, to unleash the power of representation learning in OL2R, we propose to directly learn a neural ranking model from users' implicit feedback (e.g., clicks) collected on the fly. We focus on RankNet and LambdaRank, due to their great empirical success and wide adoption in offline settings, and control the notorious explore-exploit trade-off based on the convergence analysis of neural networks using neural tangent kernel. Specifically, in each round of result serving, exploration is only performed on document pairs where the predicted rank order between the two documents is uncertain; otherwise, the ranker's predicted order will be followed in result ranking. We prove that under standard assumptions our OL2R solution achieves a gap-dependent upper regret bound of $O(\log^2(T))$, in which the regret is defined on the total number of mis-ordered pairs over $T$ rounds. Comparisons against an extensive set of state-of-the-art OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题