文档图像的内在分解

论文标题

文档图像的内在分解

Intrinsic Decomposition of Document Images In-the-Wild

论文作者

Das, Sagnik, Sial, Hassan Ahmed, Ma, Ke, Baldrich, Ramon, Vanrell, Maria, Samaras, Dimitris

论文摘要

自动文档内容处理受纸的形状，不均匀和不同颜色的照明条件的形状引起的伪像。由于所需的大量数据，因此不可能对真实数据进行全面监督的方法。因此，对艺术深度学习模型的当前状态进行了完全或部分合成图像的培训。但是，文档阴影或阴影去除结果仍然受到影响，因为：（a）先前的方法依赖于本地颜色统计的统一性，这限制了其在带有复杂文档形状和纹理的实际筛查上的应用，并且；（b）具有非现实，模拟照明条件的合成或混合数据集用于训练模型。在本文中，我们通过我们的两个主要贡献来解决这些问题。首先，一种基于身体约束的方法，该方法基于内在图像形成直接估算文档反射率，该图像概括为具有挑战性的照明条件。其次，通过添加各种逼真的阴影和多样化的多弹性条件，可以清楚地定制出唯一的定制来处理野外文档，从而明显改善了合成数据集的新数据集。所提出的建筑以一种自制的方式起作用，在这种方式中，只有合成纹理被用作弱训练信号（避免了对非常昂贵的地面真理的需求，并具有散热的版本的阴影和反射率）。拟议的方法导致在具有挑战性照明的真实场景中对文档反射估计的重大概括。我们对可用于固有图像分解和文档删除任务的实际基准数据集进行了广泛的评估。当用作OCR管道的预处理步骤时，我们的反射估计方案显示了角色错误率（CER）的26％，因此证明了实际适用性。

Automatic document content processing is affected by artifacts caused by the shape of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to a significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 26% improvement of character error rate (CER), thus, proving the practical applicability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题