论文标题

多个文档数据集预训练可改善深层神经网络的文本线检测

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

论文作者

Boillet, Mélodie, Kermorvant, Christopher, Paquet, Thierry

论文摘要

在本文中,我们介绍了一个完全卷积网络,用于文档布局分析任务。尽管最先进的方法正在使用在自然场景图像上预先训练的模型,但我们的DOC-FOCN依赖于从头开始训练的U形模型来检测历史文档的对象。我们将行分割任务和更一般的布局分析问题视为像素分类任务,然后我们的模型输出输入图像的像素标记。我们表明,DOC-UFCN在各种数据集上的表现都优于最先进的方法,并且还表明,自然场景图像上的预训练零件不需要以达到良好的结果。此外,我们表明在多个文档数据集上进行预培训可以改善性能。我们使用各种指标评估模型,以在方法之间进行公平,完整的比较。

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源