论文标题
低资源语言的有效实体候选人生成
Efficient Entity Candidate Generation for Low-Resource Languages
论文作者
论文摘要
候选人生成是实体链接中的关键模块。它在多个NLP任务中也起着关键作用,这些任务已被证明是有益地利用知识库的。然而,随着幼稚的方法获得很好的表现,它经常在单语的英语实体中被忽略。不幸的是,现有的英语方法不能成功地转移到资源不足的语言中。本文构成了对候选人生成问题的深入分析,即跨语性实体与关注低资源语言的关注。除其他贡献外,我们还指出了先前工作中评估的局限性。我们根据其难度将查询的特征介绍给类型,这提高了不同方法的性能的解释性。我们还基于索引的构建,其设计是由基于更复杂的转移学习方法的动机,提出了一种轻巧而简单的解决方案。在2个评估设置下,对9个现实世界数据集进行了彻底的经验分析表明,我们的简单解决方案在几乎所有数据集和查询类型的质量和效率方面都优于最先进的方法。
Candidate generation is a crucial module in entity linking. It also plays a key role in multiple NLP tasks that have been proven to beneficially leverage knowledge bases. Nevertheless, it has often been overlooked in the monolingual English entity linking literature, as naive approaches obtain very good performance. Unfortunately, the existing approaches for English cannot be successfully transferred to poorly resourced languages. This paper constitutes an in-depth analysis of the candidate generation problem in the context of cross-lingual entity linking with a focus on low-resource languages. Among other contributions, we point out limitations in the evaluation conducted in previous works. We introduce a characterization of queries into types based on their difficulty, which improves the interpretability of the performance of different methods. We also propose a light-weight and simple solution based on the construction of indexes whose design is motivated by more complex transfer learning based neural approaches. A thorough empirical analysis on 9 real-world datasets under 2 evaluation settings shows that our simple solution outperforms the state-of-the-art approach in terms of both quality and efficiency for almost all datasets and query types.