多个文档数据集预训练可改善深层神经网络的文本线检测

论文标题

多个文档数据集预训练可改善深层神经网络的文本线检测

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

论文作者

Boillet, Mélodie, Kermorvant, Christopher, Paquet, Thierry

论文摘要

在本文中，我们介绍了一个完全卷积网络，用于文档布局分析任务。尽管最先进的方法正在使用在自然场景图像上预先训练的模型，但我们的DOC-FOCN依赖于从头开始训练的U形模型来检测历史文档的对象。我们将行分割任务和更一般的布局分析问题视为像素分类任务，然后我们的模型输出输入图像的像素标记。我们表明，DOC-UFCN在各种数据集上的表现都优于最先进的方法，并且还表明，自然场景图像上的预训练零件不需要以达到良好的结果。此外，我们表明在多个文档数据集上进行预培训可以改善性能。我们使用各种指标评估模型，以在方法之间进行公平，完整的比较。

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题