论文标题
一项有关嘈杂标签深度学习的调查:当您无法信任注释时,如何培训模型?
A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?
论文作者
论文摘要
嘈杂的标签通常存在于从互联网自动收集的数据集中,被非专家注释者或甚至在具有挑战性的任务中的专家(例如在医学领域)中的专家。尽管深度学习模型已经显示出不同领域的显着改善,但一个空旷的问题是它们在训练过程中记住嘈杂标签的能力,从而降低了其概括潜力。由于深度学习模型取决于正确标记的数据集和难以保证的标签正确性,因此考虑存在嘈杂的标签以进行深度学习训练,这一点至关重要。文献中已经提出了几种方法,可以在嘈杂标签的存在下改善深度学习模型的训练。本文介绍了文献中主要技术的调查,其中我们在以下组中对算法进行了分类:强大的损失,样本加权,样本选择,元学习和合并方法。我们还介绍了最新模型的常用实验设置,数据集和结果。
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by non-specialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization potential. As deep learning models depend on correctly labeled data sets and label correctness is difficult to guarantee, it is crucial to consider the presence of noisy labels for deep learning training. Several approaches have been proposed in the literature to improve the training of deep learning models in the presence of noisy labels. This paper presents a survey on the main techniques in literature, in which we classify the algorithm in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches. We also present the commonly used experimental setup, data sets, and results of the state-of-the-art models.