论文标题
部分可观测时空混沌系统的无模型预测
You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
论文作者
论文摘要
布局分析(区域的识别及其分类)是光学特征识别和类似任务中线分割的第一步。从边际文本或运行标题中识别主体的能力使得提取数字化书籍的全文和嘈杂输出的作品之间的区别。我们表明,大多数分段者都专注于像素分类,尽管在2010年代初期的重点是最新的历史文档竞赛(ICDAR 2017及以后)的多边形。我们建议将任务从基于像素分类的多边形转移到使用异形矩形的对象检测。我们根据分割比较了Kraken和Yolov5的输出,并表明后来的非常优于小数据集(1110个样本及以下)上的第一个。我们发布了两个用于培训和评估历史文档的数据集以及一个新软件包Yaltai,该数据集将Yolov5注射在Kraken 4.1的分割管道中。
Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygonization of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely outperforms the first on small datasets (1110 samples and below). We release two datasets for training and evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in the segmentation pipeline of Kraken 4.1.