字符区域的关注文本发现

论文标题

字符区域的关注文本发现

Character Region Attention For Text Spotting

论文作者

Baek, Youngmin, Shin, Seung, Baek, Jeonghun, Park, Sungrae, Lee, Junyeop, Nam, Daehyun, Lee, Hwalsuk

论文摘要

场景文本删除器由文本检测和识别模块组成。已经进行了许多研究，以将这些模块统一为端到端的可训练模型，以实现更好的性能。典型的体系结构将检测和识别模块放置在单独的分支中，而ROI合并通常用于让分支共享视觉功能。但是，在采用识别器时使用基于注意的解码器和代表角色区域空间信息的识别器时，仍然存在模块之间建立更加免费的联系的机会。这是可能的，因为两个模块共享一个常见的子任务，该子任务是找到角色区域的位置。基于洞察力，我们构建了一个紧密耦合的单管道模型。该体系结构是通过在识别器中利用检测输出并通过检测阶段传播识别损失而形成的。字符得分图的使用有助于识别器对角色中心点更好，并且对检测器模块的识别损失传播增强了角色区域的定位。此外，加强的共享阶段允许特征纠正和任意形状文本区域的边界定位。广泛的实验表明，在公开可用的直且弯曲的基准数据集中表现出最新的性能。

A scene text spotter is composed of text detection and recognition modules. Many studies have been conducted to unify these modules into an end-to-end trainable model to achieve better performance. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. However, there still exists a chance of establishing a more complimentary connection between the modules when adopting recognizer that uses attention-based decoder and detector that represents spatial information of the character regions. This is possible since the two modules share a common sub-task which is to find the location of the character regions. Based on the insight, we construct a tightly coupled single pipeline model. This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage. The use of character score map helps the recognizer attend better to the character center points, and the recognition loss propagation to the detector module enhances the localization of the character regions. Also, a strengthened sharing stage allows feature rectification and boundary localization of arbitrary-shaped text regions. Extensive experiments demonstrate state-of-the-art performance in publicly available straight and curved benchmark dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题