论文标题
嵌套命名的实体识别为整体结构解析
Nested Named Entity Recognition as Holistic Structure Parsing
论文作者
论文摘要
作为一项基本的自然语言处理任务和核心知识提取技术之一,命名的实体识别(NER)被广泛用于从文本中提取下游任务的信息。 Nested Ner是NER的一个分支,其中指定的实体(NES)彼此嵌套。但是,对嵌套NER的大多数研究通常都采用线性结构来对嵌套的NE进行建模,而NES实际上是在分层结构中。因此,为了解决这一不匹配,这项工作将句子中的完整嵌套NE模拟为整体结构,然后我们提出了一个整体结构解析算法,以一次披露整个NES一次。此外,目前尚无有关将语料库级信息应用于NER的研究。为了弥补这些信息的丢失,我们从语料库感知统计数据中介绍了点上的互信息(PMI)和其他频率特征,从而通过从句子级别到Corpus级别的整体建模来提高性能。实验表明,我们的模型在广泛使用的基准上产生了有希望的结果,这些基准的方法甚至达到了最新的。进一步的经验研究表明,我们提出的语料库感知特征可以基本上改善NER领域的适应性,这证明了我们提出的语料库级整体结构建模具有令人惊讶的优势。
As a fundamental natural language processing task and one of core knowledge extraction techniques, named entity recognition (NER) is widely used to extract information from texts for downstream tasks. Nested NER is a branch of NER in which the named entities (NEs) are nested with each other. However, most of the previous studies on nested NER usually apply linear structure to model the nested NEs which are actually accommodated in a hierarchical structure. Thus in order to address this mismatch, this work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all. Besides, there is no research on applying corpus-level information to NER currently. To make up for the loss of this information, we introduce Point-wise Mutual Information (PMI) and other frequency features from corpus-aware statistics for even better performance by holistic modeling from sentence-level to corpus-level. Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art. Further empirical studies show that our proposed corpus-aware features can substantially improve NER domain adaptation, which demonstrates the surprising advantage of our proposed corpus-level holistic structure modeling.