论文标题
自动化的学习
Automated Imbalanced Learning
论文作者
论文摘要
自动化的机器学习在自动化机器学习模型开发的耗时,迭代任务方面已经变得非常成功。但是,当数据不平衡时,当前的方法很难。由于许多现实世界中的数据集自然失衡,并且处理此问题的不当处理可能会导致毫无用处的模型,因此应仔细处理此问题。本文首先介绍了一个新的基准测试,以研究如何影响标签失衡的不同汽车方法。其次,我们提出策略,以更好地处理失衡并将其整合到现有的汽车框架中。最后,我们提出了一项系统的研究,该研究评估了这些策略的影响,并发现它们在汽车系统中的包含大大提高了其对标签失衡的鲁棒性。
Automated Machine Learning has grown very successful in automating the time-consuming, iterative tasks of machine learning model development. However, current methods struggle when the data is imbalanced. Since many real-world datasets are naturally imbalanced, and improper handling of this issue can lead to quite useless models, this issue should be handled carefully. This paper first introduces a new benchmark to study how different AutoML methods are affected by label imbalance. Second, we propose strategies to better deal with imbalance and integrate them into an existing AutoML framework. Finally, we present a systematic study which evaluates the impact of these strategies and find that their inclusion in AutoML systems significantly increases their robustness against label imbalance.