论文标题

NER-MQMRC:制定指定的实体识别为多问题机器阅读理解

NER-MQMRC: Formulating Named Entity Recognition as Multi Question Machine Reading Comprehension

论文作者

Shrimal, Anubhav, Jain, Avi, Mehta, Kartik, Yenigalla, Promod

论文摘要

传统上,NER被称为序列标记任务。但是,最新的趋势是将NER作为机器阅读理解任务(Wang等,2020; Mengge等,2020),其中实体名称(或其他信息)被视为问题,文本是文本中的上下文和实体价值作为答案snippet。这些作品一次基于一个问题(实体)考虑MRC。我们建议将其作为一个多问题的MRC任务,同时考虑了一个文本的多个问题(每个实体一个问题)。我们为此公式提出了一种基于BERT的新型多问题MRC(NER-MQMRC)架构。 NER-MQMRC体系结构将所有实体视为BERT的所有实体,用于学习具有自我注意事项的令牌嵌入,并利用基于BERT的实体表示,以进一步改善这些令牌嵌入到NER任务。在三个NER数据集上的评估表明,与基于NER-SQMRC框架的模型相比,我们提出的架构的平均训练速度平均快2.5倍,并且推断的速度更快2.3倍,并且通过在单个通行证中考虑所有实体。此外,我们表明,与基于单问题的MRC(NER-SQMRC)相比,我们的模型性能不会降低(Devlin等,2019),导致AE-Pub,Ecommerce5pt和Twitter数据集的F1增益 +0.41%, +0.41%, +0.32%和 +0.27%。我们建议该体系结构主要是为了解决大型电子商务属性(或实体)提取,从非结构化的文本中提取了50k+属性的幅度,这些属性将在具有高性能和优化的培训和推理运行时的可扩展生产环境中提取。

NER has been traditionally formulated as a sequence labeling task. However, there has been recent trend in posing NER as a machine reading comprehension task (Wang et al., 2020; Mengge et al., 2020), where entity name (or other information) is considered as a question, text as the context and entity value in text as answer snippet. These works consider MRC based on a single question (entity) at a time. We propose posing NER as a multi-question MRC task, where multiple questions (one question per entity) are considered at the same time for a single text. We propose a novel BERT-based multi-question MRC (NER-MQMRC) architecture for this formulation. NER-MQMRC architecture considers all entities as input to BERT for learning token embeddings with self-attention and leverages BERT-based entity representation for further improving these token embeddings for NER task. Evaluation on three NER datasets show that our proposed architecture leads to average 2.5 times faster training and 2.3 times faster inference as compared to NER-SQMRC framework based models by considering all entities together in a single pass. Further, we show that our model performance does not degrade compared to single-question based MRC (NER-SQMRC) (Devlin et al., 2019) leading to F1 gain of +0.41%, +0.32% and +0.27% for AE-Pub, Ecommerce5PT and Twitter datasets respectively. We propose this architecture primarily to solve large scale e-commerce attribute (or entity) extraction from unstructured text of a magnitude of 50k+ attributes to be extracted on a scalable production environment with high performance and optimised training and inference runtimes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源