可靠的半监督学习的对比信誉传播

论文标题

可靠的半监督学习的对比信誉传播

Contrastive Credibility Propagation for Reliable Semi-Supervised Learning

论文作者

Kutt, Brody, Ramteke, Pralay, Mignot, Xavier, Toman, Pamela, Ramanan, Nandini, Chhetri, Sujit Rokka, Huang, Shan, Du, Min, Hewlett, William

论文摘要

生产无标记数据的标签是容易出错的，使半监督学习（SSL）麻烦。通常，关于何时以及为什么算法无法胜过监督的基线知之甚少。使用基准数据集，我们制作了五个常见的现实世界SSL数据方案：在标签和未标记的集合中，很少标签，开放式，嘈杂标签和类分布不平衡/错误。我们提出了一种新型算法，称为深层SSL的对比度可信度传播（CCP），通过迭代转导的伪标签细化。 CCP统一了半监督的学习和嘈杂的标签学习，以便在任何数据方案中可靠地优于监督基线。与侧重于情景子集的先前方法相比，在所有情况下，CCP唯一优于监督基线，在标记或未标记数据的质量未知时支持从业人员。

Producing labels for unlabeled data is error-prone, making semi-supervised learning (SSL) troublesome. Often, little is known about when and why an algorithm fails to outperform a supervised baseline. Using benchmark datasets, we craft five common real-world SSL data scenarios: few-label, open-set, noisy-label, and class distribution imbalance/misalignment in the labeled and unlabeled sets. We propose a novel algorithm called Contrastive Credibility Propagation (CCP) for deep SSL via iterative transductive pseudo-label refinement. CCP unifies semi-supervised learning and noisy label learning for the goal of reliably outperforming a supervised baseline in any data scenario. Compared to prior methods which focus on a subset of scenarios, CCP uniquely outperforms the supervised baseline in all scenarios, supporting practitioners when the qualities of labeled or unlabeled data are unknown.

下载PDF全文

下载文献需遵守相关版权规定

论文标题