论文标题
将众包注释者分布纳入整体建模中,以提高古希腊纸莎草纸的分类可信度
Incorporating Crowdsourced Annotator Distributions into Ensemble Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri
论文作者
论文摘要
在嘈杂,众包数据集上进行分类,即使对于最佳的神经网络也可能会挑战。这类数据集上问题复杂的两个问题是标签中的类不平衡和基础不确定性。 Al -All和Al -Pub数据集由古希腊纸莎草纸的图像紧密地裁剪,单个角色组成 - 都受到了这两个问题的强烈影响。将集合建模应用于此类数据集可以帮助识别地面真相值得怀疑的图像并量化这些样本的可信度。因此,我们应用了堆积的概括,这些概括由具有不同损耗函数的几乎相同的重新NET组成:一种利用稀疏的跨凝胶(CXE)和另一个Kullback-liebler Divergence(KLD)。这两个网络都使用从人群共识中提取的标签。该共识来自基于数据集中给定字符的所有注释的注释(NDA)的归一化分布。对于第二个网络,KLD是根据NDA计算的。对于我们的合奏模型,我们将K-Neart最邻居模型应用于CXE和KLD网络的输出。单独地,重新网络模型的精度约为93%,而集成模型的精度> 95%,从而提高了分类的可信度。我们还对各种模型输出分布的香农熵进行分析,以衡量分类不确定性。我们的结果表明,熵可用于预测模型错误分类。
Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues. The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples. As such, we apply stacked generalization consisting of nearly identical ResNets with different loss functions: one utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence (KLD). Both networks use labels drawn from a crowd-sourced consensus. This consensus is derived from a Normalized Distribution of Annotations (NDA) based on all annotations for a given character in the dataset. For the second network, the KLD is calculated with respect to the NDA. For our ensemble model, we apply a k-nearest neighbors model to the outputs of the CXE and KLD networks. Individually, the ResNet models have approximately 93% accuracy, while the ensemble model achieves an accuracy of > 95%, increasing the classification trustworthiness. We also perform an analysis of the Shannon entropy of the various models' output distributions to measure classification uncertainty. Our results suggest that entropy is useful for predicting model misclassifications.