TLGAN：使用生成对抗网的文档文本本地化

论文标题

TLGAN：使用生成对抗网的文档文本本地化

TLGAN: document Text Localization using Generative Adversarial Nets

论文作者

Kim, Dongyoung, Kwak, Myungsung, Won, Eunji, Shin, Sejung, Nam, Jeongyeon

论文摘要

来自数字图像的文本本地化是光学字符识别任务的第一步。基于图像处理的常规文本本地化可用于特定示例。然而，只有最近基于深度学习的方式存档了一般的文本本地化。在这里，我们提出文档文本本地化生成对抗网（TLGAN），它们是从数字图像中执行文本本地化的深神经网络。 TLGAN是一个多功能且易于培训的文本本地化模型，需要少量数据。仅培训扫描收据和信息提取（SROIE）的鲁棒阅读挑战中的十个标记的收据图像，TLGAN获得了99.83％的精度，SROIE测试数据的召回率为99.64％。我们的TLGAN是一种实用的文本本地化解决方案，需要为数据标记和模型培训以及产生最先进的性能而进行最小的努力。

Text localization from the digital image is the first step for the optical character recognition task. Conventional image processing based text localization performs adequately for specific examples. Yet, a general text localization are only archived by recent deep-learning based modalities. Here we present document Text Localization Generative Adversarial Nets (TLGAN) which are deep neural networks to perform the text localization from digital image. TLGAN is an versatile and easy-train text localization model requiring a small amount of data. Training only ten labeled receipt images from Robust Reading Challenge on Scanned Receipts OCR and Information Extraction (SROIE), TLGAN achieved 99.83% precision and 99.64% recall for SROIE test data. Our TLGAN is a practical text localization solution requiring minimal effort for data labeling and model training and producing a state-of-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题