论文标题

探索和评估实体对齐的属性,值和结构

Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment

论文作者

Liu, Zhiyuan, Cao, Yixin, Pan, Liangming, Li, Juanzi, Liu, Zhiyuan, Chua, Tat-Seng

论文摘要

实体对齐(EA)旨在通过将各种kg的等效实体联系起来来构建丰富内容的统一知识图(kg)。基于GNN的EA方法通过建模由关系三元定义的KG结构来表达有希望的性能。但是,属性三元组也可以提供关键的对齐信号,但尚未得到很好的探索。在本文中,我们建议利用属性的值编码器并将kg分配到子图中,以有效地对各种类型的属性三元组进行建模。此外,由于现有EA数据集的名称偏差,当前EA方法的性能被高估了。为了进行客观的评估,我们提出了一个硬实验设置,其中我们选择具有非常不同名称的等效实体对作为测试集。在常规设置和硬设置下,我们的方法在$ 12 $ 15 $ 15 $ k的平均命中率($ 5.10 \%$ $ $ k)上取得了重大改进,超过了$ 12 $的跨语义和单语言数据集的基线。关于不同子图和有关属性类型的案例研究的消融研究进一步证明了我们方法的有效性。源代码和数据可以在https://github.com/thunlp/explore-and-evaluate上找到。

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performances by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective evaluation, we propose a hard experimental setting where we select equivalent entity pairs with very different names as the test set. Under both the regular and hard settings, our method achieves significant improvements ($5.10\%$ on average Hits@$1$ in DBP$15$k) over $12$ baselines in cross-lingual and monolingual datasets. Ablation studies on different subgraphs and a case study about attribute types further demonstrate the effectiveness of our method. Source code and data can be found at https://github.com/thunlp/explore-and-evaluate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源