文档图像中目录提取的多模式树解码器

论文标题

文档图像中目录提取的多模式树解码器

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

论文作者

Hu, Pengfei, Zhang, Zhenrong, Zhang, Jianshu, Du, Jun, Wu, Jiajia

论文摘要

目录（TOC）提取表旨在提取文档中不同级别的标题，以更好地了解内容的轮廓，这些轮廓可广泛用于文档理解和信息检索。现有作品通常使用手工制作的功能和预定义的基于规则的功能来检测标题并解决标题之间的层次关系。基于深度学习的基准和研究仍然有限。因此，在本文中，我们首先介绍了标准数据集HierDoc，包括来自650个科学论文文档的图像样本及其内容标签。然后，我们通过使用多模式树解码器（MTD）作为HIERDOC的基准提出了一种新颖的端到端模型。 MTD模型主要由三个部分组成，即编码器，分类器和解码器。编码器融合了文档每个实体的视觉，文本和布局信息的多模式特征。然后，分类器识别并选择标题实体。接下来，为了解析标题实体之间的分层关系，设计了树结构化解码器。为了评估性能，采用了树 - 编辑距离相似性（TEDS）和F1量的度量。最后，我们的MTD方法在HIERDOC的测试集中平均达到了87.2％，平均F1量量为88.1％。代码和数据集将在以下网址发布：https：//github.com/pengfei-hu/mtd。

Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works often use hand-crafted features and predefined rule-based functions to detect headings and resolve the hierarchical relationship between headings. Both the benchmark and research based on deep learning are still limited. Accordingly, in this paper, we first introduce a standard dataset, HierDoc, including image samples from 650 documents of scientific papers with their content labels. Then we propose a novel end-to-end model by using the multimodal tree decoder (MTD) for ToC as a benchmark for HierDoc. The MTD model is mainly composed of three parts, namely encoder, classifier, and decoder. The encoder fuses the multimodality features of vision, text, and layout information for each entity of the document. Then the classifier recognizes and selects the heading entities. Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed. To evaluate the performance, both the metric of tree-edit-distance similarity (TEDS) and F1-Measure are adopted. Finally, our MTD approach achieves an average TEDS of 87.2% and an average F1-Measure of 88.1% on the test set of HierDoc. The code and dataset will be released at: https://github.com/Pengfei-Hu/MTD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题