论文标题
与匹配的多类文本分类
Many-Class Text Classification with Matching
论文作者
论文摘要
在这项工作中,我们将\ textbf {t} ext \ textbf {c}套件作为\ textbf {m}在文本和标签之间提出问题,并提出了一个名为tcm的简单但有效的框架。与以前的文本分类方法相比,TCM利用了分类标签的细粒语义信息,这有助于在班级数量较大时更好地区分每个班级,尤其是在低资源场景中。 TCM也易于实现,并且与各种大型审慎的语言模型兼容。我们在4个文本分类数据集(每个标签20个标签)上评估了TCM,既有射击和全数据”设置,并且该模型比其他文本分类范式展示了显着改进。我们还对TCM的不同变体进行了广泛的实验,并讨论了其成功的潜在因素。我们的方法和分析提供了有关文本分类的新观点。
In this work, we formulate \textbf{T}ext \textbf{C}lassification as a \textbf{M}atching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels, which helps distinguish each class better when the class number is large, especially in low-resource scenarios. TCM is also easy to implement and is compatible with various large pretrained language models. We evaluate TCM on 4 text classification datasets (each with 20+ labels) in both few-shot and full-data settings, and this model demonstrates significant improvements over other text classification paradigms. We also conduct extensive experiments with different variants of TCM and discuss the underlying factors of its success. Our method and analyses offer a new perspective on text classification.