论文标题
巴西法律文件的序列意识到的多模式页面分类
Sequence-aware multimodal page classification of Brazilian legal documents
论文作者
论文摘要
巴西最高法院每学期都会收到成千上万的案件。法院员工花了数千个小时来执行这些案件的初步分析和分类 - 这需要努力从案件管理工作流的后部,更复杂的阶段进行努力。在本文中,我们探讨了巴西最高法院对文件的多模式分类。我们在一个新颖的多式联模数据集上训练和评估我们的方法,该数据集的6,510个诉讼(339,478页),并用手动注释将每个页面分配给六个类之一。每个诉讼都是一个有序的页面序列,它们既可以作为图像和通过光学特征识别提取的相应文本存储。我们首先训练两个单峰分类器:图像上对Imagenet进行了预先训练的重新NET,并且图像上进行了微调,并且具有多个内核尺寸过滤器的卷积网络在文档文本上从scratch上进行了训练。我们将它们用作视觉和文本特征的提取器,然后通过我们提出的融合模块组合它们。我们的融合模块可以使用学习的嵌入方式来处理缺少的文本或视觉输入,以丢失数据。此外,我们尝试使用双向长期记忆(BILSTM)网络和线性链条件随机字段进行实验,以模拟页面的顺序性质。多模式方法的表现都优于文本分类器和视觉分类器,尤其是在利用页面的顺序性质时。
The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.