通过大量财务语料库提高NER的绩效

论文标题

通过大量财务语料库提高NER的绩效

Improving NER's Performance with Massive financial corpus

论文作者

Zhang, Han

论文摘要

培训大型深度神经网络需要大量的高质量注释数据，但是对于小企业来说，时间和人工成本太昂贵了。我们以小规模和低质量的培训数据开始公司名称识别任务，然后使用技能来提高模型训练速度并以最低的人工成本预测绩效。我们使用的方法涉及预先培训的Lite语言模型，例如Albert-Small或Electra-Small在财务语料库中，蒸馏知识和多阶段学习。结果是，我们将召回率提高了近20点，并获得了BERT-CRF模型的4倍。

Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题