论文标题

分配自适应软回归树

Distributional Adaptive Soft Regression Trees

论文作者

Umlauf, Nikolaus, Klein, Nadja

论文摘要

随机森林是与许多问题有关的合奏方法,例如回归或分类。由于其良好的预测性能(例如,与决策树相比),它们很受欢迎,只需要对超参数调整最小的调整。它们是通过训练过程中多重回归树的聚合而建造的,通常使用硬拆分规则递归计算。最近,回归森林已纳入分布回归的框架中,如今,一种流行的回归方法旨在估算完整的条件分布,而不是仅将输出变量的平均值与输入特征相关联 - 如经典所做的。本文建议使用多元软拆分规则提出了一种新型的分布回归树。软拆分的一个很大的优点是,仅一棵树只能估算平滑的高维函数,而该函数的复杂性是通过信息标准控制自适应的。此外,搜索最佳拆分变量已过时。我们通过广泛的模拟研究表明,该算法具有出色的特性,并且表现优于各种基准方法,尤其是在存在复杂的非线性特征相互作用的情况下。最后,我们用一个关于太阳活动的概率预测的示例来说明我们的方法的有用性。

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of hyperparameters. They are built via aggregation of multiple regression trees during training and are usually calculated recursively using hard splitting rules. Recently regression forests have been incorporated into the framework of distributional regression, a nowadays popular regression approach aiming at estimating complete conditional distributions rather than relating the mean of an output variable to input features only - as done classically. This article proposes a new type of a distributional regression tree using a multivariate soft split rule. One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree while the complexity of the function is controlled adaptive by information criteria. Moreover, the search for the optimal split variable is obsolete. We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods, especially in the presence of complex non-linear feature interactions. Finally, we illustrate the usefulness of our approach with an example on probabilistic forecasts for the Sun's activity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源