论文标题

PARS:伪标签意识到使用嘈杂标签学习的健壮样品选择

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

论文作者

Goel, Arushi, Jiao, Yunlong, Massiah, Jordan

论文摘要

在大规模数据集上获取准确的标签既耗时又昂贵。为了减少深度学习模型从清洁数据中学习的依赖性,最近的一些研究工作集中在嘈杂的标签上学习。这些方法通常属于三个设计类别,以学习噪声强大的模型:样本选择方法,噪声强大的损耗功能或标签校正方法。在本文中,我们提出了PARS:伪标签意识到的强大样本选择,这是一种混合方法,将所有三个世界中最好的方法结合在联合培训框架中,以实现嘈杂标签的鲁棒性。具体而言,PARS使用原始/嘈杂标签和通过自我训练进行估计/翻新的伪标签来利用所有训练样本,将样品分为模棱两可的子集和嘈杂的子集,并设计两组过滤标签的依赖性标签依赖性的噪声损失功能。结果表明,在关于嘈杂的CIFAR-10和CIFAR-100数据集的广泛研究中,PARS明显优于最新技术,尤其是在挑战高噪声和低资源设置方面。特别是,PAR在CIFAR-100数据集上的测试准确性的绝对提高了12%,当仅1/5标签在训练过程中仅可用1/5标签时,测试准确性的绝对提高了27%。在现实世界中的嘈杂数据集上,服装1M,在最新情况下取得了竞争成果。

Acquiring accurate labels on large-scale datasets is both time consuming and expensive. To reduce the dependency of deep learning models on learning from clean labeled data, several recent research efforts are focused on learning with noisy labels. These methods typically fall into three design categories to learn a noise robust model: sample selection approaches, noise robust loss functions, or label correction methods. In this paper, we propose PARS: Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the best from all three worlds in a joint-training framework to achieve robustness to noisy labels. Specifically, PARS exploits all training samples using both the raw/noisy labels and estimated/refurbished pseudo-labels via self-training, divides samples into an ambiguous and a noisy subset via loss analysis, and designs label-dependent noise-aware loss functions for both sets of filtered labels. Results show that PARS significantly outperforms the state of the art on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets, particularly on challenging high-noise and low-resource settings. In particular, PARS achieved an absolute 12% improvement in test accuracy on the CIFAR-100 dataset with 90% symmetric label noise, and an absolute 27% improvement in test accuracy when only 1/5 of the noisy labels are available during training as an additional restriction. On a real-world noisy dataset, Clothing1M, PARS achieves competitive results to the state of the art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源