论文标题

用于数据不平衡数据的二元分类的混合方法

A Hybrid Approach for Binary Classification of Imbalanced Data

论文作者

Tsai, Hsin-Han, Yang, Ta-Wei, Wong, Wai-Man, Chou, Cheng-Fu

论文摘要

使用不平衡数据集的二进制分类具有挑战性。模型倾向于将所有样本视为属于多数类的样本。尽管现有的解决方案(例如抽样方法,成本敏感方法和集成学习方法)提高了少数群体的准确性,但这些方法受到难以确定的问题或成本参数的限制。我们提出了HADR,这是一种降低维度的混合方法,由数据块构建,降低性降低和与深度神经网络分类器进行集成学习组成。我们评估了八个不平衡的公共数据集的性能,从召回,g均值和AUC方面。结果表明,我们的模型优于最先进的方法。

Binary classification with an imbalanced dataset is challenging. Models tend to consider all samples as belonging to the majority class. Although existing solutions such as sampling methods, cost-sensitive methods, and ensemble learning methods improve the poor accuracy of the minority class, these methods are limited by overfitting problems or cost parameters that are difficult to decide. We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning with deep neural network classifiers. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC. The results show that our model outperforms state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源