最佳组合半监督学习的分类器

论文标题

最佳组合半监督学习的分类器

Optimally Combining Classifiers for Semi-Supervised Learning

论文作者

Wang, Zhiguo, Yang, Liusha, Yin, Feng, Lin, Ke, Shi, Qingjiang, Luo, Zhi-Quan

论文摘要

本文考虑了对表格数据的半监督学习。众所周知，基于树模型的XGBoost在异质特征上很好地运作，而转导支持向量机可以利用低密度分离假设。但是，几乎没有完成将它们结合在一起以进行端到端半监督学习的工作。在本文中，我们发现这两种方法具有互补的属性和更大的多样性，这激发了我们提出一种新的半监督学习方法，该方法能够自适应地结合XGBoost和threspuctive supportive supportive向量机的优势。建立了整体权重方面的优化问题，而不是多数投票规则，这有助于获得更准确的伪标签，以获取未标记的数据。 UCI数据集和实际商业数据集的实验结果表明，我们方法的分类性能优于五种最先进的算法，将测试准确性提高了约3美元\％-4 \％$。可以在https://github.com/hav-cam-mit/cto上找到部分代码。

This paper considers semi-supervised learning for tabular data. It is widely known that Xgboost based on tree model works well on the heterogeneous features while transductive support vector machine can exploit the low density separation assumption. However, little work has been done to combine them together for the end-to-end semi-supervised learning. In this paper, we find these two methods have complementary properties and larger diversity, which motivates us to propose a new semi-supervised learning method that is able to adaptively combine the strengths of Xgboost and transductive support vector machine. Instead of the majority vote rule, an optimization problem in terms of ensemble weight is established, which helps to obtain more accurate pseudo labels for unlabeled data. The experimental results on the UCI data sets and real commercial data set demonstrate the superior classification performance of our method over the five state-of-the-art algorithms improving test accuracy by about $3\%-4\%$. The partial code can be found at https://github.com/hav-cam-mit/CTO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题