SGBANET：用于任意定向场景文本识别的语义gan和平衡的注意网络

论文标题

SGBANET：用于任意定向场景文本识别的语义gan和平衡的注意网络

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

论文作者

Zhong, Dajian, Lyu, Shujing, Shivakumara, Palaiahnakote, Yin, Bing, Wu, Jiajia, Pal, Umapada, Lu, Yue

论文摘要

由于复杂的背景和文本实例的不同变化，场景文本识别是一项具有挑战性的任务。在本文中，我们提出了一个新颖的语义gan和平衡的注意网络（SGBANET），以识别场景图像中的文本。提出的方法首先使用语义gan生成简单的语义功能，然后使用平衡的注意模块识别场景文本。语义GAN旨在使支持域和目标域之间的语义特征分布对齐。与在图像级别执行的传统图像到图像翻译方法不同，语义GAN使用语义生成器模块（SGM）和语义歧视器模块（SDM）在语义级别执行生成和歧视。对于目标图像（场景文本图像），语义生成器模块生成简单的语义特征，这些功能与支持图像（清晰的文本图像）共享相同的特征分布。语义鉴别器模块用于区分支持域和目标域之间的语义特征。此外，平衡的注意模块旨在减轻注意力漂移的问题。平衡注意模块首先根据视觉瞥见向量和语义瞥见向量学习平衡参数，然后执行平衡操作以获得平衡的瞥见向量。在六个基准测试的实验，包括常规数据集，即IIIT5K，SVT，ICDAR2013和不规则数据集，即ICDAR2015，SVTP，cute80，验证我们提出的方法的有效性。

Scene text recognition is a challenging task due to the complex backgrounds and diverse variations of text instances. In this paper, we propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images. The proposed method first generates the simple semantic feature using Semantic GAN and then recognizes the scene text with the Balanced Attention Module. The Semantic GAN aims to align the semantic feature distribution between the support domain and target domain. Different from the conventional image-to-image translation methods that perform at the image level, the Semantic GAN performs the generation and discrimination on the semantic level with the Semantic Generator Module (SGM) and Semantic Discriminator Module (SDM). For target images (scene text images), the Semantic Generator Module generates simple semantic features that share the same feature distribution with support images (clear text images). The Semantic Discriminator Module is used to distinguish the semantic features between the support domain and target domain. In addition, a Balanced Attention Module is designed to alleviate the problem of attention drift. The Balanced Attention Module first learns a balancing parameter based on the visual glimpse vector and semantic glimpse vector, and then performs the balancing operation for obtaining a balanced glimpse vector. Experiments on six benchmarks, including regular datasets, i.e., IIIT5K, SVT, ICDAR2013, and irregular datasets, i.e., ICDAR2015, SVTP, CUTE80, validate the effectiveness of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题