论文标题

有效的神经查询自动完成

Efficient Neural Query Auto Completion

论文作者

Wang, Sida, Guo, Weiwei, Gao, Huiji, Long, Bo

论文摘要

查询自动完成(QAC)作为信息检索任务的起点,对用户体验至关重要。通常,它有两个步骤:根据查询前缀生成完整的查询候选者,并根据提取的功能对其进行排名。查询自动完成系统的三个主要挑战:(1)QAC有严格的在线延迟要求。对于每次击键,必须在数十毫秒内返回结果,这在为其设计复杂的语言模型方面构成了重大挑战。 (2)对于看不见的查询,生成的候选人的质量很差,因为上下文信息尚未完全利用。 (3)传统的QAC系统在很大程度上依赖手工制作的功能,例如搜索日志中的查询候选频率,缺乏对候选者的语义理解。 在本文中,我们提出了一个有效的上下文建模的有效的神经QAC系统,以克服这些挑战。在候选人的生成方面,该系统在看不见的前缀中使用尽可能多的信息来产生相关的候选者,从而将召回率较大。在候选人的排名方面,提出了一个不符合的语言模型,该模型有效地捕获了查询的深度语义。与神经语言建模方法相比,这种方法比最新的神经排名方法提出了更好的排名性能,并降低了$ \ sim $ 95 \%延迟。公共数据集的经验结果表明,我们的模型在准确性和效率之间取得了良好的平衡。该系统在LinkedIn求职中提供,观察到了显着的产品影响。

Query Auto Completion (QAC), as the starting point of information retrieval tasks, is critical to user experience. Generally it has two steps: generating completed query candidates according to query prefixes, and ranking them based on extracted features. Three major challenges are observed for a query auto completion system: (1) QAC has a strict online latency requirement. For each keystroke, results must be returned within tens of milliseconds, which poses a significant challenge in designing sophisticated language models for it. (2) For unseen queries, generated candidates are of poor quality as contextual information is not fully utilized. (3) Traditional QAC systems heavily rely on handcrafted features such as the query candidate frequency in search logs, lacking sufficient semantic understanding of the candidate. In this paper, we propose an efficient neural QAC system with effective context modeling to overcome these challenges. On the candidate generation side, this system uses as much information as possible in unseen prefixes to generate relevant candidates, increasing the recall by a large margin. On the candidate ranking side, an unnormalized language model is proposed, which effectively captures deep semantics of queries. This approach presents better ranking performance over state-of-the-art neural ranking methods and reduces $\sim$95\% latency compared to neural language modeling methods. The empirical results on public datasets show that our model achieves a good balance between accuracy and efficiency. This system is served in LinkedIn job search with significant product impact observed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源