论文标题

快速阶级:一种耗时的方法,用于弱监督文本分类

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

论文作者

Xia, Tingyu, Wang, Yue, Tian, Yuan, Chang, Yi

论文摘要

弱监督的文本分类旨在仅使用类描述和未标记的数据培训分类器。最近的研究表明,关键字驱动的方法可以在各种任务上实现最新的性能。但是,这些方法不仅依靠精心制作的类描述来获取特定的关键字,而且还需要大量的未标记数据,并且需要很长时间才能进行培训。本文提出了一种有效的弱监督分类方法的FastClass。它使用密集的文本表示从外部未标记的语料库检索相关文档,并选择一个最佳子集来训练分类器。与关键字驱动的方法相比,我们的方法不再依赖于初始类描述,因为它不再需要将每个类描述扩展到一组特定类的关键字中。在各种分类任务上进行的实验表明,所提出的方法在分类准确性方面经常优于关键字驱动的模型,并且经常享受固定级的训练速度更快。

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords. Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源