论文标题

与伯特的跨语性信息检索

Cross-lingual Information Retrieval with BERT

论文作者

Jiang, Zhuolin, El-Jaroudi, Amro, Hartmann, William, Karakos, Damianos, Zhao, Lingjun

论文摘要

最近已经开发了多种神经语言模型,例如Bert和XLNet,并在包括句子分类,问题答案和文档排名的各种NLP任务中取得了令人印象深刻的结果。在本文中,我们探讨了流行的双向语言模型BERT的使用,以建模和了解英语查询与外语文档之间的相关性,以实现跨语言信息检索的任务。通过使用来自Parallel Corpora的自制CLIR培训数据,通过对较弱的较弱的监督进行验证的多语言BERT模型,通过对BERT进行的深层相关性匹配模型进行了培训。立陶宛文档针对简短的英语查询的实验结果表明,我们的模型是有效的,并且表现优于竞争性基线方法。

Multiple neural language models have been developed recently, e.g., BERT and XLNet, and achieved impressive results in various NLP tasks including sentence classification, question answering and document ranking. In this paper, we explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual information retrieval. A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision, using home-made CLIR training data derived from parallel corpora. Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源