论文标题

在带有Wikipedia概念的嘈杂文本中进行主动信息检索

Towards Proactive Information Retrieval in Noisy Text with Wikipedia Concepts

论文作者

Ahmed, Tabish, Bulathwela, Sahan

论文摘要

从用户历史记录中提取有用的信息以清楚地了解信息需求是主动信息检索系统的关键特征。关于了解信息和相关性,Wikipedia可以提供智能系统所需的背景知识。这项工作探讨了如何利用Wikipedia概念来利用查询的上下文可以改善嘈杂文本的主动信息检索。我们制定了两个使用实体链接将Wikipedia主题与相关模型关联的模型。我们围绕播客段检索任务进行的实验表明,Wikipedia概念中存在明确的相关信号,而排名模型可以通过合并来提高精度。我们还发现,查询的背景上下文的Wikify可以帮助消除查询的含义,从而进一步帮助积极的信息检索。

Extracting useful information from the user history to clearly understand informational needs is a crucial feature of a proactive information retrieval system. Regarding understanding information and relevance, Wikipedia can provide the background knowledge that an intelligent system needs. This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text. We formulate two models that use entity linking to associate Wikipedia topics with the relevance model. Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts while a ranking model can improve precision by incorporating them. We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源