Safl：自我发挥的场景文本识别器，局灶性损失

论文标题

Safl：自我发挥的场景文本识别器，局灶性损失

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

论文作者

Tran, Bao Hieu, Le-Cong, Thanh, Nguyen, Huu Manh, Le, Duc Anh, Nguyen, Thanh Hung, Nguyen, Phi Le

论文摘要

在过去的几十年中，由于其在广泛的应用程序中的重要性，因此场景文本识别引起了全球学术界和实际用户的关注。尽管在光学特征识别方面取得了成就，但由于诸如扭曲或不规则布局之类的固有问题，场景文本识别仍然具有挑战性。大多数现有方法主要利用复发或基于卷积的神经网络。但是，尽管经常性的神经网络（RNN）通常由于顺序计算而遭受缓慢的训练速度，并且遇到了由于消失的梯度或瓶颈而遇到的问题，但CNN却忍受了复杂性和性能之间的权衡。在本文中，我们介绍了SAFL，这是一种基于自发的神经网络模型，具有场景文本识别的焦点损失，以克服现有方法的局限性。使用焦点损失而不是负模样的使用有助于模型更多地关注低频样品训练。此外，为了处理扭曲和不规则文本，我们利用空间变换工厂（STN）在传递到识别网络之前对文本进行纠正。我们执行实验，将所提出模型的性能与七个基准进行比较。数值结果表明，我们的模型实现了最佳性能。

In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training. Moreover, to deal with the distortions and irregular texts, we exploit Spatial TransformerNetwork (STN) to rectify text before passing to the recognition network. We perform experiments to compare the performance of the proposed model with seven benchmarks. The numerical results show that our model achieves the best performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题