论文标题

Uni-Retriever:在Bing赞助的搜索中学习基于统一的基于嵌入的猎犬

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search

论文作者

Zhang, Jianjin, Liu, Zheng, Han, Weihao, Xiao, Shitao, Zheng, Ruicheng, Shao, Yingxia, Sun, Hao, Zhu, Hanqing, Srinivasan, Premkumar, Deng, Denvy, Zhang, Qi, Xie, Xing

论文摘要

基于嵌入的检索(EBR)是许多Web应用程序中的基本构建块。但是,赞助搜索中的EBR与其他通用方案区分开,并且由于需要提供多个检索目的而在技术上具有挑战性:首先,它必须检索高相关性广告,这可能完全服务于用户的搜索意图;其次,它需要检索高-CTR广告,以最大程度地提高用户点击。在本文中,我们提出了一个新颖的表示学习框架,为Bing搜索开发了Uni-Retriever,该框架统一了两种不同的训练模式知识蒸馏和对比度学习,以实现这两个必需的目标。一方面,通过从``相关教师模型''中提取知识来确定进行高相关检索的能力。另一方面,通过学习将用户的点击广告与整个语料库区分开来,可以优化进行高CRTR检索的能力。这两种训练模式共同作为多目标学习过程,因此生成的嵌入可以有利于高相关性和CTR的广告。除了学习策略外,我们还详细阐述了基于实质优化的Diskann建立的EBR服务管道的解决方案,在该管道中,可以以竞争性的时间和记忆效率进行大规模的EBR,并以高质量的方式完成。我们进行了全面的离线和在线实验,以评估所提出的技术,其发现可能为EBR系统的未来开发提供了有用的见解。由于代表和EBR服务质量的显着改善,Uni-Retriever已成为Bing生产的主要检索道路。

Embedding based retrieval (EBR) is a fundamental building block in many web applications. However, EBR in sponsored search is distinguished from other generic scenarios and technically challenging due to the need of serving multiple retrieval purposes: firstly, it has to retrieve high-relevance ads, which may exactly serve user's search intent; secondly, it needs to retrieve high-CTR ads so as to maximize the overall user clicks. In this paper, we present a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives. On one hand, the capability of making high-relevance retrieval is established by distilling knowledge from the ``relevance teacher model''. On the other hand, the capability of making high-CTR retrieval is optimized by learning to discriminate user's clicked ads from the entire corpus. The two training modes are jointly performed as a multi-objective learning process, such that the ads of high relevance and CTR can be favored by the generated embeddings. Besides the learning strategy, we also elaborate our solution for EBR serving pipeline built upon the substantially optimized DiskANN, where massive-scale EBR can be performed with competitive time and memory efficiency, and accomplished in high-quality. We make comprehensive offline and online experiments to evaluate the proposed techniques, whose findings may provide useful insights for the future development of EBR systems. Uni-Retriever has been mainstreamed as the major retrieval path in Bing's production thanks to the notable improvements on the representation and EBR serving quality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源