重新考虑标签以改善课堂失衡学习的价值

论文标题

重新考虑标签以改善课堂失衡学习的价值

Rethinking the Value of Labels for Improving Class-Imbalanced Learning

论文作者

Yang, Yuzhe, Xu, Zhi

论文摘要

现实世界中的数据经常表现出长尾分布，并具有沉重的阶级不平衡，对深层识别模型构成了巨大的挑战。我们在不平衡学习的背景下确定了标签价值的持久困境：一方面，标签的监督通常会带来比无监督的同行更好的结果；另一方面，分类器中的严重失衡数据自然会引起“标签偏差”，在这种情况下，大多数类别可以大大改变决策边界。在这项工作中，我们系统地研究了标签的这两个方面。从理论和经验上讲，我们证明了阶级不平衡的学习可以在半监督和自我监督的举止中显着受益。具体而言，我们确认（1）积极的标签是有价值的：鉴于更多未标记的数据，可以用额外的数据来利用原始标签，以半监督的方式减少标签偏见，从而极大地改善了最终分类器；（2）然而，我们认为不平衡的标签始终不是有用的：首先以自我监督的方式预先训练的分类器一致地优于其相应的基线。大规模不平衡数据集的广泛实验验证了我们理论上的策略，显示出优于以前的最先进的策略。我们有趣的发现强调了需要重新考虑在现实的长尾任务中使用不平衡标签的使用。代码可从https://github.com/yyzharry/imbalanced-semi-self获得。

Real-world data often exhibits long-tailed distributions with heavy class imbalance, posing great challenges for deep recognition models. We identify a persisting dilemma on the value of labels in the context of imbalanced learning: on the one hand, supervision from labels typically leads to better results than its unsupervised counterparts; on the other hand, heavily imbalanced data naturally incurs "label bias" in the classifier, where the decision boundary can be drastically altered by the majority classes. In this work, we systematically investigate these two facets of labels. We demonstrate, theoretically and empirically, that class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners. Specifically, we confirm that (1) positively, imbalanced labels are valuable: given more unlabeled data, the original labels can be leveraged with the extra data to reduce label bias in a semi-supervised manner, which greatly improves the final classifier; (2) negatively however, we argue that imbalanced labels are not useful always: classifiers that are first pre-trained in a self-supervised manner consistently outperform their corresponding baselines. Extensive experiments on large-scale imbalanced datasets verify our theoretically grounded strategies, showing superior performance over previous state-of-the-arts. Our intriguing findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks. Code is available at https://github.com/YyzHarry/imbalanced-semi-self.

下载PDF全文

下载文献需遵守相关版权规定

论文标题