论文标题
GCF:在线市场中进行异质治疗效果估计的广义因果森林
GCF: Generalized Causal Forest for Heterogeneous Treatment Effect Estimation in Online Marketplace
论文作者
论文摘要
提升建模是一种快速增长的方法,它利用因果推理和机器学习方法直接估计异质治疗效果,该效果已广泛应用于各种在线市场,以帮助近年来大规模决策。现有的流行模型,例如因果森林(CF),仅限于离散治疗或对可能遭受模型错误指定的结果处理关系的参数假设。但是,持续的治疗(例如,价格,持续时间)通常在市场中出现。为了减轻这些限制,我们使用基于内核的双重稳健估计器来恢复非参数剂量反应函数,可以灵活地对连续治疗效果进行模拟。此外,我们提出了一个基于距离的拆分标准,以捕获连续处理的异质性。我们称提出的算法概括性因果林(GCF)将CF的用例推广到更广泛的环境。我们通过得出估算器的渐近性能,并将其与合成和现实世界数据集的流行隆升建模方法进行比较,从而显示了GCF的有效性。我们在Spark上实施GCF,并成功将其部署到领先的乘车共享公司的大规模在线定价系统中。在线A/B测试结果进一步验证了GCF的优势。
Uplift modeling is a rapidly growing approach that utilizes causal inference and machine learning methods to directly estimate the heterogeneous treatment effects, which has been widely applied to various online marketplaces to assist large-scale decision-making in recent years. The existing popular models, like causal forest (CF), are limited to either discrete treatments or posing parametric assumptions on the outcome-treatment relationship that may suffer model misspecification. However, continuous treatments (e.g., price, duration) often arise in marketplaces. To alleviate these restrictions, we use a kernel-based doubly robust estimator to recover the non-parametric dose-response functions that can flexibly model continuous treatment effects. Moreover, we propose a generic distance-based splitting criterion to capture the heterogeneity for the continuous treatments. We call the proposed algorithm generalized causal forest (GCF) as it generalizes the use case of CF to a much broader setting. We show the effectiveness of GCF by deriving the asymptotic property of the estimator and comparing it to popular uplift modeling methods on both synthetic and real-world datasets. We implement GCF on Spark and successfully deploy it into a large-scale online pricing system at a leading ride-sharing company. Online A/B testing results further validate the superiority of GCF.