论文标题
嵌套命名的实体识别为潜在的词汇化选区解析
Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing
论文作者
论文摘要
嵌套命名的实体识别(NER)一直受到越来越多的关注。最近,(Fu等,2021)适应一个基于SPAN的选区解析器来应对Nested Ner。他们将嵌套的实体视为部分观察到的组成树,并提出内部边缘化算法内部掩盖的掩盖。但是,他们的方法无法利用实体头,这些主管已显示在实体中提及检测和实体键入。在这项工作中,我们求助于更具表现力的结构,即词汇化的成分树,其中由头衔注释成分,以建模嵌套实体。我们利用Eisner-Satta算法进行部分边缘化和有效的推理。此外,我们建议使用(1)两个阶段策略(2)头部正规化损失和(3)为了提高性能的头脑感知标签损失。我们进行了一项彻底的消融研究,以研究每个组件的功能。在实验上,我们的方法实现了ACE2004,ACE2005和NNE的最新性能,以及在Genia上的竞争性能,同时具有快速的推理速度。
Nested named entity recognition (NER) has been receiving increasing attention. Recently, (Fu et al, 2021) adapt a span-based constituency parser to tackle nested NER. They treat nested entities as partially-observed constituency trees and propose the masked inside algorithm for partial marginalization. However, their method cannot leverage entity heads, which have been shown useful in entity mention detection and entity typing. In this work, we resort to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities. We leverage the Eisner-Satta algorithm to perform partial marginalization and inference efficiently. In addition, we propose to use (1) a two-stage strategy (2) a head regularization loss and (3) a head-aware labeling loss in order to enhance the performance. We make a thorough ablation study to investigate the functionality of each component. Experimentally, our method achieves the state-of-the-art performance on ACE2004, ACE2005 and NNE, and competitive performance on GENIA, and meanwhile has a fast inference speed.