论文标题
多个文档数据集预训练可改善深层神经网络的文本线检测
Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks
论文作者
论文摘要
在本文中,我们介绍了一个完全卷积网络,用于文档布局分析任务。尽管最先进的方法正在使用在自然场景图像上预先训练的模型,但我们的DOC-FOCN依赖于从头开始训练的U形模型来检测历史文档的对象。我们将行分割任务和更一般的布局分析问题视为像素分类任务,然后我们的模型输出输入图像的像素标记。我们表明,DOC-UFCN在各种数据集上的表现都优于最先进的方法,并且还表明,自然场景图像上的预训练零件不需要以达到良好的结果。此外,我们表明在多个文档数据集上进行预培训可以改善性能。我们使用各种指标评估模型,以在方法之间进行公平,完整的比较。
In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.