论文标题
远端监督的端到端医疗实体从具有人级质量的电子健康记录中提取
Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality
论文作者
论文摘要
医疗实体提取(EE)是一种标准程序,用作医学文本处理的第一阶段。通常,医学EE是一个两步的过程:命名实体识别(NER)和命名实体归一化(NEN)。我们通过微调在大型EHR数据集上预测的变压器模型来提出一种从电子健康记录(EHR)进行医学EE作为单步多标签分类任务的新方法。使用从医学知识库中自动提取的目标,我们的模型以遥远的监督方式进行了训练。我们表明,我们的模型学会了对经常出现的实体进行概括,从而为大多数频繁的实体实现人类水平的分类质量。我们的工作表明,鉴于有足够数量的未标记的EHR和医学知识库,可以在没有人类监督的情况下端到端进行医疗实体提取。
Medical entity extraction (EE) is a standard procedure used as a first stage in medical texts processing. Usually Medical EE is a two-step process: named entity recognition (NER) and named entity normalization (NEN). We propose a novel method of doing medical EE from electronic health records (EHR) as a single-step multi-label classification task by fine-tuning a transformer model pretrained on a large EHR dataset. Our model is trained end-to-end in an distantly supervised manner using targets automatically extracted from medical knowledge base. We show that our model learns to generalize for entities that are present frequently enough, achieving human-level classification quality for most frequent entities. Our work demonstrates that medical entity extraction can be done end-to-end without human supervision and with human quality given the availability of a large enough amount of unlabeled EHR and a medical knowledge base.