论文标题
Pirank:一个基于概率意图的基于概率的排名框架,用于Facebook搜索
piRank: A Probabilistic Intent Based Ranking Framework for Facebook Search
论文作者
论文摘要
尽管在文献中进行了许多研究,探讨了搜索排名的不同类型的机器学习方法,但其中大多数专注于特定的预定义问题,但只有少数人研究了可以以可扩展方式以商业搜索引擎应用的排名框架。同时,现有的排名模型通常是针对标准化折扣累积收益(NDCG)或在线点击率(CTR)进行优化的,并且两种类型的机器学习模型都是基于以下假设,即可以轻松地获得高质量的培训数据,并且可以很好地应用于未看到的情况。在Facebook搜索中,我们观察到ML模型的培训数据存在某些问题。首先,我们的人类评级数据集几乎没有涵盖尾部查询意图。其次,由于各种原因,搜索点击日志通常很吵,很难清理。为了解决上述问题,在本文中,我们提出了一个基于概率意图的排名框架(Pirank的缩写),该框架可以:1)提供一个可扩展的框架,以解决以分裂和折扣方式解决不同查询意图的各种排名问题; 2)改善系统开发敏捷性,包括迭代速度和系统调试性; 3)以系统的方式结合机器学习和基于经验的算法方法。我们在Facebook搜索引擎系统的基础上进行了广泛的实验和研究,并验证了这种新排名体系结构的有效性。
While numerous studies have been conducted in the literature exploring different types of machine learning approaches for search ranking, most of them are focused on specific pre-defined problems but only a few of them have studied the ranking framework which can be applied in a commercial search engine in a scalable way. In the meantime, existing ranking models are often optimized for normalized discounted cumulative gains (NDCG) or online click-through rate (CTR), and both types of machine learning models are built based on the assumption that high-quality training data can be easily obtained and well applied to unseen cases. In practice at Facebook search, we observed that our training data for ML models have certain issues. First, tail query intents are hardly covered in our human rating dataset. Second, search click logs are often noisy and hard to clean up due to various reasons. To address the above issues, in this paper, we propose a probabilistic intent based ranking framework (short for piRank), which can: 1) provide a scalable framework to address various ranking issues for different query intents in a divide-and-conquer way; 2) improve system development agility including iteration speed and system debuggability; 3) combine both machine learning and empirical-based algorithmic methods in a systematic way. We conducted extensive experiments and studies on top of Facebook search engine system and validated the effectiveness of this new ranking architecture.