论文标题
学习使用基于模型的功能检测嘈杂的标签
Learning to Detect Noisy Labels Using Model-Based Features
论文作者
论文摘要
在各种机器学习方案中,诸如模型预测和错误的数据注释等各种机器学习方案中,标签噪声无处不在。许多现有的方法基于启发式方法,例如样本损失,这些方法可能不够灵活,无法实现最佳解决方案。基于元学习的方法通过学习数据选择功能来解决此问题,但很难优化。鉴于这些利弊,我们提出了不依赖元学习的选择增强的嘈杂标签培训(已发送),同时具有数据驱动的灵活性。发送的噪声分布转移到了干净的集合中,并训练模型,以使用基于模型的功能将噪声标签与干净的标签区分开。从经验上讲,在包括文本分类和语音识别在内的各种任务上,在自我训练和标记腐败的设置下,发出的绩效提高了强大基准的性能。
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose Selection-Enhanced Noisy label Training (SENT) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption.