论文标题
通过递归贪婪的风险最小化使用随机森林的积极未标记的学习
Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
论文作者
论文摘要
在许多应用程序中出现了从积极和未标记的数据或PU学习中学习的需求,并引起了人们日益增长的兴趣。虽然已知随机森林在具有正面数据和负面数据的许多任务上表现良好,但最近的PU算法通常基于深层神经网络,而基于树的PU学习的潜力却没有探索。在本文中,我们提出了新的随机森林算法,以进行PU-LECERNING。我们方法的关键是对\ emph {递归贪婪的风险最小化算法}的决策树算法的新解释。我们将此视角扩展到PU设置,以开发新的决策树学习算法,该算法直接最大程度地减少了基于PU-DATA的预期风险的估计器。这使我们能够开发出一种有效的PU随机森林算法,pu额外的树木。我们的方法具有三个理想的特性:从各种损失功能导致相同决策树的意义上讲,它可以选择损失函数;与基于神经网络的PU学习相比,它几乎不需要高参数调整;它支持了一个直接衡量功能对风险最小化的贡献的功能重要性。我们的算法在几个数据集上表现出强劲的性能。我们的代码可在\ url {https://github.com/puetpaper/puextratrees}中找到。
The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random forest algorithms for PU-learning. Key to our approach is a new interpretation of decision tree algorithms for positive and negative data as \emph{recursive greedy risk minimization algorithms}. We extend this perspective to the PU setting to develop new decision tree learning algorithms that directly minimizes PU-data based estimators for the expected risk. This allows us to develop an efficient PU random forest algorithm, PU extra trees. Our approach features three desirable properties: it is robust to the choice of the loss function in the sense that various loss functions lead to the same decision trees; it requires little hyperparameter tuning as compared to neural network based PU learning; it supports a feature importance that directly measures a feature's contribution to risk minimization. Our algorithms demonstrate strong performance on several datasets. Our code is available at \url{https://github.com/puetpaper/PUExtraTrees}.