论文标题
主动宠物:主动数据注释优先级,用于使用模式训练的模式验证的几次索赔验证
Active PETs: Active Data Annotation Prioritisation for Few-Shot Claim Verification with Pattern Exploiting Training
论文作者
论文摘要
为了减轻标记数据对事实检查系统的稀缺性的影响,我们专注于少数声称的验证。尽管最近通过提出高级语言模型进行了几次射击分类的工作,但数据注释优先级的研究缺乏研究,可以改善为最佳模型性能标记的少数标签的选择。我们提出了活跃的宠物,这是一种新型的加权方法,利用基于各种语言模型的模式开发培训(PET)模型的合奏来积极选择未标记的数据作为注释的候选者。使用活跃的宠物进行少量数据选择,在两个技术事实检查数据集以及使用六种不同的预处理的语言模型上显示了基线方法的一致改进。我们通过Active Pets-O展示了进一步的进步,该宠物O进一步整合了过度采样策略。我们的方法使有效的实例可以被标记为无标记的数据丰富,但标签资源受到限制,从而始终提高了几杆索赔验证绩效。我们的代码可用。
To mitigate the impact of the scarcity of labelled data on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weighted approach that utilises an ensemble of Pattern Exploiting Training (PET) models based on various language models, to actively select unlabelled data as candidates for annotation. Using Active PETs for few-shot data selection shows consistent improvement over the baseline methods, on two technical fact-checking datasets and using six different pretrained language models. We show further improvement with Active PETs-o, which further integrates an oversampling strategy. Our approach enables effective selection of instances to be labelled where unlabelled data is abundant but resources for labelling are limited, leading to consistently improved few-shot claim verification performance. Our code is available.