TSRFormer：具有变压器的表结构识别

论文标题

TSRFormer：具有变压器的表结构识别

TSRFormer: Table Structure Recognition with Transformers

论文作者

Lin, Weihong, Sun, Zheng, Ma, Chixiang, Li, Mingze, Wang, Jiawei, Sun, Lei, Huo, Qiang

论文摘要

我们提出了一种新的表结构识别方法（TSR）方法，称为TSRFormer，以稳健地识别来自各种表图像的几何变形的复杂表的结构。与以前的方法不同，我们将表分离线预测作为线回归问题，而不是图像分割问题，并提出了一种新的两阶段DETR基于基于两阶段的分离器的分离器预测方法，称为\ textbf {sep} arator \ textbf {re} re} gression \ textbf {textbf {tr} anssformer（sepreTer（sepreter），以直接从表中进行图像。为了使两阶段的DETR框架在分离线预测任务上有效，有效地工作，我们提出了两个改进：1）一种先前增强的匹配策略，以解决慢速收敛问题的DETR； 2）直接来自高分辨率卷积特征图的样本特征的新的交叉注意模块，以便以低计算成本实现高定位精度。在分离线预测之后，使用简单的基于关系网络的单元格合并模块恢复跨越单元。借助这些新技术，我们的TSRFormer在包括SCITSR，PubTabnet和WTW在内的多个基准数据集上实现了最先进的性能。此外，我们已经验证了使用复杂的结构，无边界的单元，大空间，空的或跨越的单元格以及在更具挑战性的现实世界内部数据集中扭曲甚至弯曲的形状的表对表的鲁棒性。

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage DETR based separator prediction approach, dubbed \textbf{Sep}arator \textbf{RE}gression \textbf{TR}ansformer (SepRETR), to predict separation lines from table images directly. To make the two-stage DETR framework work efficiently and effectively for the separation line prediction task, we propose two improvements: 1) A prior-enhanced matching strategy to solve the slow convergence issue of DETR; 2) A new cross attention module to sample features from a high-resolution convolutional feature map directly so that high localization accuracy is achieved with low computational cost. After separation line prediction, a simple relation network based cell merging module is used to recover spanning cells. With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW. Furthermore, we have validated the robustness of our approach to tables with complex structures, borderless cells, large blank spaces, empty or spanning cells as well as distorted or even curved shapes on a more challenging real-world in-house dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题