论文标题
单样本作家 - “文档过滤器”及其对作者身份证的影响
Single-sample writers -- "Document Filter" and their impacts on writer identification
论文作者
论文摘要
写作可以用作重要的生物识别方式,可以明确地识别一个人。之所以发生,是因为两个不同的人的写作都可以在图形计量属性方面探索,甚至通过将手稿作为数字图像来探讨,考虑到使用图像处理技术可以正确捕获图像的不同视觉属性(例如,纹理)。在这项工作中,执行一项详细的研究,在该研究中,我们会剖析使用数据库的使用是否仅从某些作者那里获得单个样本可能会偏向实验协议中获得的结果。从这个意义上讲,我们在这里提出所谓的“文档过滤器”。 “文档过滤器”协议应该用作预处理技术,从而使所有从同一文档片段中获取的数据都必须放入培训或测试集中。其背后的理由是,分类器必须从作者本身中捕获特征,而不是有关可能影响特定文档中写作的其他特殊性(即作者的情感状态,笔,纸质类型等)的功能。通过分析文献,可以找到一些解决作者识别问题的作品。但是,必须评估作者识别系统的性能,还考虑到在创建手稿数据库期间用单个样本做出贡献的作者志愿者的出现。为了解决此处调查的空旷问题,在IAM,BFL和CVL数据库上进行了一系列实验。他们表明,在最极端的情况下,使用“文档过滤器”协议获得的识别率从81.80%下降到50.37%。
The writing can be used as an important biometric modality which allows to unequivocally identify an individual. It happens because the writing of two different persons present differences that can be explored both in terms of graphometric properties or even by addressing the manuscript as a digital image, taking into account the use of image processing techniques that can properly capture different visual attributes of the image (e.g. texture). In this work, perform a detailed study in which we dissect whether or not the use of a database with only a single sample taken from some writers may skew the results obtained in the experimental protocol. In this sense, we propose here what we call "document filter". The "document filter" protocol is supposed to be used as a preprocessing technique, such a way that all the data taken from fragments of the same document must be placed either into the training or into the test set. The rationale behind it, is that the classifier must capture the features from the writer itself, and not features regarding other particularities which could affect the writing in a specific document (i.e. emotional state of the writer, pen used, paper type, and etc.). By analyzing the literature, one can find several works dealing the writer identification problem. However, the performance of the writer identification systems must be evaluated also taking into account the occurrence of writer volunteers who contributed with a single sample during the creation of the manuscript databases. To address the open issue investigated here, a comprehensive set of experiments was performed on the IAM, BFL and CVL databases. They have shown that, in the most extreme case, the recognition rate obtained using the "document filter" protocol drops from 81.80% to 50.37%.