Verbert：使用Bert自动化巴西判例法文件多标签分类

论文标题

Verbert：使用Bert自动化巴西判例法文件多标签分类

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

论文作者

Serras, Felipe R., Finger, Marcelo

论文摘要

在这项工作中，我们进行了一项有关使用基于注意力的算法来自动化巴西判例法文件分类的研究。我们使用Kollemata项目中的数据来生产两个具有足够类系统的不同数据集。然后，我们通过生产的数据集实现了BERT的多类和多标签版本的Bert和微调的不同BERT模型。我们评估了几个指标，采用微平均F1得分作为我们的主要指标，我们获得的性能值为F1-Micro = 0.72，对应于经过测试的统计基线的30％点。在这项工作中，我们进行了一项有关使用基于注意力的算法来自动化巴西判例法文件分类的研究。我们使用了\ textit {kollemata}项目中的数据来生产两个具有足够类系统的不同数据集。然后，我们通过生产的数据集实现了BERT的多类和多标签版本的Bert和微调的不同BERT模型。我们评估了几个指标，采用微平均F1得分作为我们的主要指标，我们获得了$ \ langle \ Mathcal {f} _1 \ rangle_ {micro} = 0.72 $的性能值，相对于经过测试的统计基线的30％点。

In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of F1-micro=0.72 corresponding to gains of 30 percent points over the tested statistical baseline. In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the \textit{Kollemata} Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of $\langle \mathcal{F}_1 \rangle_{micro}=0.72$ corresponding to gains of 30 percent points over the tested statistical baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题