无多边形：带有框注释的无约束场景检测

论文标题

无多边形：带有框注释的无约束场景检测

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

论文作者

Wu, Weijia, Xie, Enze, Zhang, Ruimao, Wang, Wenhai, Zhou, Hong, Luo, Ping

论文摘要

尽管多边形比直立的边界框更准确地表示文本检测，但多边形的注释非常昂贵且具有挑战性。与使用多边形注释完全监督的培训的现有作品不同，本研究提出了一种被称为多边形（PF）的不受约束的文本检测系统，其中大多数现有的基于多边形的文本检测器（例如，PSENET [33]，DB [16]）仅接受使用直立的界限盒注释训练。我们的核心思想是将知识从合成数据转移到真实数据，以增强直立边界框的监督信息。通过简单的分割网络，即骨骼注意分割网络（SASN），这是可能的，其中包括三个重要组成部分（即通道注意力，空间注意力和骨骼注意图）和一个软交叉透镜丢失。实验表明，所提出的多边形系统可以将通用检测器（例如，East，Psenet，db）结合在一起，以产生令人惊讶的高质量像素级结果，并且仅在各种数据集中使用直立的边界框注释（例如，ICDAR2019-ART2019-ART，TotalText，ICDAR2015）。例如，在不使用多边形注释的情况下，PSENET在TotalText [3]上达到了80.5％的F-评分[3]（占完全监督的对应物的80.9％），比直接使用直立边界盒注释的直接培训要好31.1％，并节省80％+标签成本。我们希望PF可以为文本检测提供新的观点，以降低标签成本。该代码可以在https://github.com/weijiawu/unconcontain-text-detection-with-box-supervisionand-supervisionand and-dynamic-self-training。

Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors (e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made possible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components (i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygonfree system can combine general detectors (e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText, ICDAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs. The code can be found at https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题