基于自适应直方图的梯度提升树木用于联合学习

论文标题

基于自适应直方图的梯度提升树木用于联合学习

Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning

论文作者

Ong, Yuya Jeremy, Zhou, Yi, Baracaldo, Nathalie, Ludwig, Heiko

论文摘要

联合学习（FL）是一种在不共享当事方或聚合者之间的数据的情况下进行协作培训模型的方法。它既可以在消费域中用于保护个人数据以及企业设置，在该设置中，处理数据区域调节和数据筒仓的语用是主要驱动因素。尽管在许多用例中，诸如XGBoost之类的梯度增强的树木实现非常成功，但由于使用加密和隐私方法，其联合学习适应性往往非常慢，并且没有经历过广泛的使用。我们提出了用于联合学习的政党自适应XGBoost（PAX），这是一种梯度增强的新实现，它利用了党派自适应直方图聚合方法，而无需数据加密。它构建了数据分布的替代表示，以查找决策树的拆分。我们的实验结果证明了强大的模型性能，尤其是在非IID分布上，并且在不同数据集中的培训时间明显更快，而不是现有的联合实现。这种方法使得在企业联合学习中使用梯度增强的树木实用。

Federated Learning (FL) is an approach to collaboratively train a model across multiple parties without sharing data between parties or an aggregator. It is used both in the consumer domain to protect personal data as well as in enterprise settings, where dealing with data domicile regulation and the pragmatics of data silos are the main drivers. While gradient boosted tree implementations such as XGBoost have been very successful for many use cases, its federated learning adaptations tend to be very slow due to using cryptographic and privacy methods and have not experienced widespread use. We propose the Party-Adaptive XGBoost (PAX) for federated learning, a novel implementation of gradient boosting which utilizes a party adaptive histogram aggregation method, without the need for data encryption. It constructs a surrogate representation of the data distribution for finding splits of the decision tree. Our experimental results demonstrate strong model performance, especially on non-IID distributions, and significantly faster training run-time across different data sets than existing federated implementations. This approach makes the use of gradient boosted trees practical in enterprise federated learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题