论文标题
用于成本效益的端到端文本发现的动态低分辨率蒸馏
Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting
论文作者
论文摘要
端到端文本发现最近由于其对全球优化的好处和对实际应用的高可维护性的好处而引起了极大的关注。但是,输入量表一直是一个艰难的权衡,因为认识到一个小文本实例通常需要扩大整个图像,从而带来了高计算成本。在本文中,为了解决这个问题,我们提出了一种新颖的成本效益动态低分辨率蒸馏(DLD)文本斑点框架,该框架的目的是推断出不同的小但可识别的分辨率中的图像,并在准确性和效率之间取得更好的平衡。具体而言,我们采用一个分辨率选择器来动态确定不同图像的输入分辨率,这是通过推理准确性和计算成本来限制的。在文本识别分支上进行了另一种顺序知识蒸馏策略,使低分辨率输入获得了与高分辨率图像相当的性能。可以在任何当前文本斑点框架中对端到端进行优化的方法,以提高可实用性。对几个文本斑点基准测试的广泛实验表明,所提出的方法极大地提高了低分辨率模型的可用性。该代码可从https://github.com/hikopensource/davar-lab-ocr/获得。
End-to-end text spotting has attached great attention recently due to its benefits on global optimization and high maintainability for real applications. However, the input scale has always been a tough trade-off since recognizing a small text instance usually requires enlarging the whole image, which brings high computational costs. In this paper, to address this problem, we propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework, which aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. Concretely, we adopt a resolution selector to dynamically decide the input resolutions for different images, which is constraint by both inference accuracy and computational cost. Another sequential knowledge distillation strategy is conducted on the text recognition branch, making the low-res input obtains comparable performance to a high-res image. The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability. Extensive experiments on several text spotting benchmarks show that the proposed method vastly improves the usability of low-res models. The code is available at https://github.com/hikopensource/DAVAR-Lab-OCR/.