监督机器学习算法的性能和解释性比较：一项实证研究

论文标题

监督机器学习算法的性能和解释性比较：一项实证研究

Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

论文作者

Liu, Alice J., Mukherjee, Arpita, Hu, Linwei, Chen, Jie, Nair, Vijayan N.

论文摘要

本文比较了三种监督机器学习算法的性能，从预测能力和对结构化或表格数据的模型解释方面进行了比较。所考虑的算法是从Tensorflow的Scikit-Learn实施极端梯度增强机（XGB）和随机森林（RFS）以及前馈神经网络（FFNN）的实现。该论文以基于发现的方式组织，每个部分都提供了一般结论，并得到了模拟研究的经验结果支持，该研究涵盖了预测因子之间广泛的模型复杂性和相关结构。我们考虑了不同样本量的连续和二元响应。总体而言，XGB和FFNN具有竞争力，FFNN在平滑模型和基于树的增强算法中表现出更好的性能，在非平滑模型中表现更好。这一结论通常用于预测性能，识别重要变量以及确定由部分依赖图（PDP）衡量的正确输入输出关系。 FFNN通常的过度拟合度较小，这是通过培训和测试数据集之间的性能差异来衡量的。但是，XGB的差异通常很小。 RF总体上表现不佳，证实了文献中的发现。所有模型在PDP中都表现出不同程度的偏差，但是偏见对于RFS尤其有问题。偏差的程度随预测变量，响应类型和数据集样本量之间的相关性而变化。通常，基于树的模型倾向于在预测变量分布的尾部过度规范拟合模型。最后，与二进制数据相比，与二进制数据相比，连续响应的性能更好。

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题