论文标题

基于相似性的分层分裂:一种培训更好分类器的方法

Similarity Based Stratified Splitting: an approach to train better classifiers

论文作者

Farias, Felipe, Ludermir, Teresa, Bastos-Filho, Carmelo

论文摘要

我们提出了一种基于相似性的分层分裂(SBS)技术,该技术同时使用输出和输入空间信息来拆分数据。分割是使用样品之间的相似性函数生成的,以将相似的样品放在不同的分裂中。这种方法可以在训练阶段更好地表示数据。当在现实世界应用中使用时,这种策略会导致更现实的性能估计。我们在22个基准数据集中评估了我们的建议,其中包括多层感知器,支持向量机,随机森林和k-nearest邻居以及五个相似性功能CityBlock,Chebyshev,Chebyshev,Cosine,Cosine,Corselation,Correlation,Correlation和Euclidean。根据Wilcoxon标志级测试,我们的方法在75%的评估场景中始终优于普通分层的10倍交叉验证。

We propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split the data. The splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications. We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors, and five similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean. According to the Wilcoxon Sign-Rank test, our approach consistently outperformed ordinary stratified 10-fold cross-validation in 75\% of the assessed scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源