场景检测中的移动差异

论文标题

场景检测中的移动差异

Shift Variance in Scene Text Detection

论文作者

Glitzner, Markus, Neudeck, Jan-Hendrik, Härtinger, Philipp

论文摘要

卷积神经网络的理论表明，移位均值的特性，即移动输入会导致同样移动的输出。但是，实际上，情况并非总是如此。这为场景文本检测构成了一个很好的问题，对于文本检测，一致的空间响应至关重要，而与文本在场景中的位置无关。使用简单的合成实验，我们证明了最先进的完全卷积文本检测器的固有移位方差。此外，使用相同的实验设置，我们展示了较小的体系结构变化如何导致改善的移位率和较小的检测器输出变化。我们使用文本检测网络上的现实世界培训时间表来验证合成结果。为了量化转移变异性的量，我们提出了一个基于完善的文本检测基准的度量。虽然提出的架构更改无法完全恢复偏移量比，但添加平滑过滤器可以显着提高普通文本数据集的变化一致性。考虑到小移位的潜在影响，我们建议通过本工作中描述的指标扩展常用的文本检测指标，以便能够量化文本检测器的一致性。

Theory of convolutional neural networks suggests the property of shift equivariance, i.e., that a shifted input causes an equally shifted output. In practice, however, this is not always the case. This poses a great problem for scene text detection for which a consistent spatial response is crucial, irrespective of the position of the text in the scene. Using a simple synthetic experiment, we demonstrate the inherent shift variance of a state-of-the-art fully convolutional text detector. Furthermore, using the same experimental setting, we show how small architectural changes can lead to an improved shift equivariance and less variation of the detector output. We validate the synthetic results using a real-world training schedule on the text detection network. To quantify the amount of shift variability, we propose a metric based on well-established text detection benchmarks. While the proposed architectural changes are not able to fully recover shift equivariance, adding smoothing filters can substantially improve shift consistency on common text datasets. Considering the potentially large impact of small shifts, we propose to extend the commonly used text detection metrics by the metric described in this work, in order to be able to quantify the consistency of text detectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题