论文标题

平衡拆分:不平衡数据集的新火车测试数据拆分策略

Balanced Split: A new train-test data splitting strategy for imbalanced datasets

论文作者

Khan, Azal Ahmad

论文摘要

具有偏斜的类比例的分类数据集称为不平衡。类不平衡是一个问题,因为大多数机器学习分类算法都是在培训数据集中所有类平等表示的假设。因此,为了解决类不平衡问题,已经开发了许多算法级别和数据级方法。这些主要包括合奏学习和数据扩展技术。本文通过一种称为Balanced Split的新数据分解策略,展示了一种应对类不平衡问题的新方法。数据分配可以在正确分类不平衡数据集中发挥重要作用。我们表明,常用的数据分解策略具有一些缺点,我们提出的平衡分裂解决了这些问题。

Classification data sets with skewed class proportions are called imbalanced. Class imbalance is a problem since most machine learning classification algorithms are built with an assumption of equal representation of all classes in the training dataset. Therefore to counter the class imbalance problem, many algorithm-level and data-level approaches have been developed. These mainly include ensemble learning and data augmentation techniques. This paper shows a new way to counter the class imbalance problem through a new data-splitting strategy called balanced split. Data splitting can play an important role in correctly classifying imbalanced datasets. We show that the commonly used data-splitting strategies have some disadvantages, and our proposed balanced split has solved those problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源