一项关于大规模多标签文本分类的实证研究，包括少量和零标签

论文标题

一项关于大规模多标签文本分类的实证研究，包括少量和零标签

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

论文作者

Chalkidis, Ilias, Fergadiotis, Manos, Kotitsas, Sotiris, Malakasiotis, Prodromos, Aletras, Nikolaos, Androutsopoulos, Ion

论文摘要

大规模多标签文本分类（LMTC）具有广泛的自然语言处理（NLP）应用程序，并提出了有趣的挑战。首先，由于标签集非常大和LMTC数据集的偏斜标签分布，并非所有标签都在训练集中很好地表示。此外，标签层次结构和人类标记指南的差异可能会影响图形意识的注释接近度。最后，定期更新标签层次结构，需要能够零弹性概括的LMTC模型。当前最新的LMTC模型采用标签注意网络（LWANS），（1）通常将LMTC视为平面多标签分类；（2）可以使用标签层次结构来改善零击学习，尽管这种做法已大大研究；（3）尚未与预先训练的变压器（例如BERT）结合使用，这导致了最新的NLP基准。在这里，我们首次经验评估了从香草Lwans到层次分类方法的一系列LMTC方法，并在来自不同域的三个数据集上频繁，很少和零弹性学习转移学习。我们表明，基于概率标签树（PLT）的分层方法优于lwans。此外，我们表明，基于变压器的方法在两个数据集中的最新方法都优于最先进的方法，并且我们提出了一种将BERT与Lwans结合在一起的新最先进的方法。最后，我们提出了利用标签层次结构的新模型来改善少量和零射击学习，考虑到每个数据集对我们介绍的图形感知注释接近度度量。

Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware annotation proximity. Finally, the label hierarchies are periodically updated, requiring LMTC models capable of zero-shot generalization. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs), which (1) typically treat LMTC as flat multi-label classification; (2) may use the label hierarchy to improve zero-shot learning, although this practice is vastly understudied; and (3) have not been combined with pre-trained Transformers (e.g. BERT), which have led to state-of-the-art results in several NLP benchmarks. Here, for the first time, we empirically evaluate a battery of LMTC methods from vanilla LWANs to hierarchical classification approaches and transfer learning, on frequent, few, and zero-shot learning on three datasets from different domains. We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs. Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWANs. Finally, we propose new models that leverage the label hierarchy to improve few and zero-shot learning, considering on each dataset a graph-aware annotation proximity measure that we introduce.

下载PDF全文

下载文献需遵守相关版权规定

论文标题