论文标题
多个类对从不平衡和概念漂移数据流学习在线分类器的影响
The Influence of Multiple Classes on Learning Online Classifiers from Imbalanced and Concept Drifting Data Streams
论文作者
论文摘要
这项工作旨在研究局部数据特征的影响以及从多级不平衡数据流中学习各种在线分类器的困难。首先,我们在不平衡的流中介绍了这些数据因素和漂移的分类,然后介绍了对这些因素和漂移进行建模的合成流的发电机。与一个少数族裔类别的流相比,许多合成数据流的实验的结果表明,许多少数族裔类别(边界示例的类型)的重叠作用要大得多。流中罕见的例子的存在是最困难的单个因素。分裂少数群体的局部漂移是第三个影响因素。与二进制流不同,专门的UOB和OOB分类器的表现足以达到高失衡率。对于所有分类器而言,最具挑战性的是复杂的方案,同时整合了确定因素的漂移,这在几个少数群体中比二进制方面的少数群体更强大。这是在Lidta'2022研讨会上的ECMLPKD2022上发表的简短版本的扩展版。
This work is aimed at the experimental studying the influence of local data characteristics and drifts on the difficulties of learning various online classifiers from multi-class imbalanced data streams. Firstly we present a categorization of these data factors and drifts in the context of imbalanced streams, then we introduce the generators of synthetic streams that model these factors and drifts. The results of many experiments with synthetically generated data streams have shown a much greater role of the overlapping between many minority classes (the type of borderline examples) than for streams with one minority class. The presence of rare examples in the stream is the most difficult single factor. The local drift of splitting minority classes is the third influential factor. Unlike binary streams, the specialized UOB and OOB classifiers perform well enough for even high imbalance ratios. The most challenging for all classifiers are complex scenarios integrating the drifts of the identified factors simultaneously, which worsen the evaluation measures in the case of a several minority classes stronger than for binary ones. This is an extended version of the short paper presented at LIDTA'2022 workshop at ECMLPKDD2022.