LST：词典引导的自我训练，用于几个播种文本分类

论文标题

LST：词典引导的自我训练，用于几个播种文本分类

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

论文作者

Kim, Hazel, Son, Jaeman, Han, Yo-Sub

论文摘要

自我训练提供了一种使用极少量的标记数据来创建用于未标记数据的伪标签的有效方法。许多最先进的自我训练方法取决于不同的正则化方法，以防止过度拟合和改善概括。然而，他们仍然严重依赖最初用有限的标记数据作为伪标签的预测，并且可能会根据第一个预测对错误类的信念过度自信。为了在文本分类中解决此问题，我们介绍了LST，这是一种简单的自我训练方法，该方法使用词典以语言增强的方式指导伪标记机制。我们通过预测看不见的数据的信心来教授伪标记在训练迭代中更好地教导词典来完善词典。我们证明，这种简单但精心制作的词汇知识在五个基准数据集的30个标签样本上的性能优于1.0-2.0％，而不是当前的最新方法。

Self-training provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art self-training approaches hinge on different regularization methods to prevent overfitting and improve generalization. Yet they still rely heavily on predictions initially trained with the limited labeled data as pseudo-labels and are likely to put overconfident label belief on erroneous classes depending on the first prediction. To tackle this issue in text classification, we introduce LST, a simple self-training method that uses a lexicon to guide the pseudo-labeling mechanism in a linguistically-enriched manner. We consistently refine the lexicon by predicting confidence of the unseen data to teach pseudo-labels better in the training iterations. We demonstrate that this simple yet well-crafted lexical knowledge achieves 1.0-2.0% better performance on 30 labeled samples per class for five benchmark datasets than the current state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题