论文标题
通过大量财务语料库提高NER的绩效
Improving NER's Performance with Massive financial corpus
论文作者
论文摘要
培训大型深度神经网络需要大量的高质量注释数据,但是对于小企业来说,时间和人工成本太昂贵了。我们以小规模和低质量的培训数据开始公司名称识别任务,然后使用技能来提高模型训练速度并以最低的人工成本预测绩效。我们使用的方法涉及预先培训的Lite语言模型,例如Albert-Small或Electra-Small在财务语料库中,蒸馏知识和多阶段学习。结果是,我们将召回率提高了近20点,并获得了BERT-CRF模型的4倍。
Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.