论文标题
是否汇总?用单独的嘈杂标签学习
To Aggregate or Not? Learning with Separate Noisy Labels
论文作者
论文摘要
原始收集的培训数据通常带有从多个不完美的注释器收集的单独的嘈杂标签(例如,通过众包)。使用这些独立标签的典型方法是首先将它们汇总为一个并应用标准培训方法。文献还广泛研究了有效的聚合方法。本文重新审视了此选择,并旨在为一个问题提供一个答案,即是否应该将单独的嘈杂标签汇总为单个单个标签或单独使用它们作为给定标签。我们从理论上分析了许多流行损失功能的经验风险最小化框架下的两种方法的性能,包括专门为使用嘈杂标签学习的问题而设计的损失功能。我们的定理得出的结论是,当噪声速率较高时或标记/注释的数量不足时,标签分离优于标签聚集。广泛的经验结果验证了我们的结论。
The rawly collected training data often comes with separate noisy labels collected from multiple imperfect annotators (e.g., via crowdsourcing). A typical way of using these separate labels is to first aggregate them into one and apply standard training methods. The literature has also studied extensively on effective aggregation approaches. This paper revisits this choice and aims to provide an answer to the question of whether one should aggregate separate noisy labels into single ones or use them separately as given. We theoretically analyze the performance of both approaches under the empirical risk minimization framework for a number of popular loss functions, including the ones designed specifically for the problem of learning with noisy labels. Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient. Extensive empirical results validate our conclusions.