论文标题
关于基于线的OCR的CRNN的准确性:多参数评估
On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter Evaluation
论文作者
论文摘要
我们研究了如何在退化的纸张上训练高质量的光学特征识别模型(OCR)模型。通过大量的网格搜索,我们获得了神经网络体系结构和一组最佳数据增强设置。我们讨论诸如二进制,输入线高度,网络宽度,网络深度和其他网络训练参数(例如辍学)等因素的影响。将这些发现实施到实用模型中,我们能够从仅10,000行培训数据中获得0.44%的字符错误率(CER)模型,这表现优于当前可用的预告片模型,这些模型接受了20倍以上数据的培训。我们展示了培训管道的所有组件的消融,该培训管道依赖于开源框架calamari。
We investigate how to train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper. Through extensive grid searches, we obtain a neural network architecture and a set of optimal data augmentation settings. We discuss the influence of factors such as binarization, input line height, network width, network depth, and other network training parameters such as dropout. Implementing these findings into a practical model, we are able to obtain a 0.44% character error rate (CER) model from only 10,000 lines of training data, outperforming currently available pretrained models that were trained on more than 20 times the amount of data. We show ablations for all components of our training pipeline, which relies on the open source framework Calamari.