以印度语言的方式转移场景文本识别

论文标题

以印度语言的方式转移场景文本识别

Transfer Learning for Scene Text Recognition in Indian Languages

论文作者

Gunna, Sanjana, Saluja, Rohit, Jawahar, C. V.

论文摘要

由于多个脚本，字体，文本大小和方向等复杂性，印度语言中的场景文本识别是具有挑战性的。在这项工作中，我们研究了深层文本识别网络从英语到两种常见印度语言的所有层次的转移学习力量。我们在常规CRNN模型和星网上执行实验，以确保概括性。为了研究不同脚本中变化的影响，我们最初对使用Unicode字体呈现的合成单词图像进行实验。我们表明，将英语模型转移到简单的印度语言合成数据集并不实用。取而代之的是，我们建议在印度语言中应用传输学习技术，因为它们的n-gram分布和视觉特征（例如元音和连接字符）的相似性。然后，我们研究了六种印度语言之间的转移学习，字体和单词长度统计数据具有不同的复杂性。我们还证明，从其他印度语言传递的模型的学识处在视觉上比从英语传递的模型功能更接近（有时甚至更好）。最终，我们从MLT-17的IIIT-IIT-ILST和BANGLA数据集中为印地语，泰卢固语和Malayalam数据集设置了新的基准测试，通过在单词识别率（wrrs）的6％，5％，2％和23％的增长中，从MLT-17中获得了来自MLT-17的数据集。我们通过将新颖的校正Bilstm插入我们的模型来进一步改善MLT-17孟加拉结果。我们还发布了一个大约440个场景图像的数据集，其中包含500 Gujarati和2535个泰米尔语单词。 MLT-19印地语和孟加拉数据集以及古吉拉特邦和泰米尔语数据集的WRR对基准的WRR提高了8％，4％，5％和3％。

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We also demonstrate that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English. We finally set new benchmarks for scene-text recognition on Hindi, Telugu, and Malayalam datasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%, and 23% gains in Word Recognition Rates (WRRs) compared to previous works. We further improve the MLT-17 Bangla results by plugging in a novel correction BiLSTM into our model. We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words. WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题