Taxoenrich：通过结构 - 语义表示的自学分类学完成

论文标题

Taxoenrich：通过结构 - 语义表示的自学分类学完成

TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations

论文作者

Jiang, Minhao, Song, Xiangchen, Zhang, Jieyu, Han, Jiawei

论文摘要

分类学对于各个领域中的许多现实应用是基础，作为知识的结构表示。为了应对分类法需要组织的越来越多的新概念，研究人员将自动完成具有新概念的现有分类法。在本文中，我们提出了一种新的分类学完成框架Taxoenrich，该框架有效地利用了现有分类学中的语义特征和结构信息，并提供了更好地代表候选职位，以提高分类学完成的性能。具体而言，Taxoenrich由四个组成部分组成：（1）基于强大的语言模型的概念和分类关系的语义含义和分类学关系；（2）一种分类学意识的顺序编码器，通过编码分类法的结构信息来学习候选位置表示；（3）一种查询意识到的兄弟姐妹编码器，该编码器会根据其对查询位置匹配的重要性来适应候选兄弟姐妹来增强候选候选位置表示；（4）一个查询位置匹配模型，该模型通过我们的新候选位置表示扩展了现有工作。来自不同领域的四个大型现实世界数据集的广泛实验表明，\ taxoenrich在所有评估指标中实现了最佳性能，并且超过了先前最先进的方法。

Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge. To deal with the increasing volume of new concepts needed to be organized as taxonomies, researchers turn to automatically completion of an existing taxonomy with new concepts. In this paper, we propose TaxoEnrich, a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy and offers a better representation of candidate position to boost the performance of taxonomy completion. Specifically, TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy; (3) a query-aware sibling encoder which adaptively aggregates candidate siblings to augment candidate position representations based on their importance to the query-position matching; (4) a query-position matching model which extends existing work with our new candidate position representations. Extensive experiments on four large real-world datasets from different domains show that \TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art methods by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题