论文标题
可靠的半监督学习的对比信誉传播
Contrastive Credibility Propagation for Reliable Semi-Supervised Learning
论文作者
论文摘要
生产无标记数据的标签是容易出错的,使半监督学习(SSL)麻烦。通常,关于何时以及为什么算法无法胜过监督的基线知之甚少。使用基准数据集,我们制作了五个常见的现实世界SSL数据方案:在标签和未标记的集合中,很少标签,开放式,嘈杂标签和类分布不平衡/错误。我们提出了一种新型算法,称为深层SSL的对比度可信度传播(CCP),通过迭代转导的伪标签细化。 CCP统一了半监督的学习和嘈杂的标签学习,以便在任何数据方案中可靠地优于监督基线。与侧重于情景子集的先前方法相比,在所有情况下,CCP唯一优于监督基线,在标记或未标记数据的质量未知时支持从业人员。
Producing labels for unlabeled data is error-prone, making semi-supervised learning (SSL) troublesome. Often, little is known about when and why an algorithm fails to outperform a supervised baseline. Using benchmark datasets, we craft five common real-world SSL data scenarios: few-label, open-set, noisy-label, and class distribution imbalance/misalignment in the labeled and unlabeled sets. We propose a novel algorithm called Contrastive Credibility Propagation (CCP) for deep SSL via iterative transductive pseudo-label refinement. CCP unifies semi-supervised learning and noisy label learning for the goal of reliably outperforming a supervised baseline in any data scenario. Compared to prior methods which focus on a subset of scenarios, CCP uniquely outperforms the supervised baseline in all scenarios, supporting practitioners when the qualities of labeled or unlabeled data are unknown.