论文标题
OPBOOST:基于订单脱敏的垂直联合树提升框架
OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization
论文作者
论文摘要
垂直联合学习(FL)是一个新的范式,它使具有相同数据样本的非重叠属性的用户能够在不直接共享原始数据的情况下共同训练模型。尽管如此,最近的作品表明,它仍然不足以防止培训过程或训练有素的模型中的隐私泄漏。本文着重研究垂直FL下的保护隐私树的增强算法。基于密码学的现有解决方案涉及大量计算和通信开销,并且容易受到推理攻击的影响。尽管基于当地差异隐私(LDP)的解决方案解决了上述问题,但它导致了训练有素的模型的准确性较低。 本文探讨了以提高在垂直FL下满足差异隐私的广泛部署的树木增强算法的准确性。具体来说,我们介绍了一个称为opboost的框架。满足LDP变体的三种订单脱敏算法,称为基于距离的LDP(DLDP),旨在使训练数据脱敏。特别是,我们优化了DLDP定义和研究有效的采样分布,以进一步提高所提出算法的准确性和效率。所提出的算法在距离较大的对与脱敏值的效用之间的隐私之间提供了权衡。全面的评估表明,与合理设置的现有的不发光地生产法相比,OPBOOST在训练模型的预测准确性方面具有更好的性能。我们的代码是开源。
Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.