论文标题
样品选择模型的双机器学习
Double machine learning for sample selection models
论文作者
论文摘要
本文仅观察到由于样本选择或结果损耗而观察到的亚种群时,考虑了离散分布处理的评估。为了进行识别,我们将用于治疗分配的选择性假设与有关结果损耗/样本选择过程的仪器变量假设相结合。我们还考虑动态混杂,这意味着共同影响样本选择的协变量,结果可能会受到治疗的影响(至少部分)。为了以数据驱动的方式控制潜在的高维前和/或治疗后协变量,我们将双重机器学习框架用于治疗评估,以解决样本选择问题。我们利用(a)Neyman-Ortornal,双重稳健且有效的得分功能,这意味着在基于机器学习的结果,治疗或样本选择模型以及(B)样品剥离(或交叉拟合)的基于机器学习的估计中,治疗效应估计的鲁棒性估计对中度正则化偏差,以防止过度效果偏见。我们证明,在模拟研究中,在特定的规律性条件下,所提出的估计量在有关机器学习者的特定规律条件下是渐近正常的,并且根-N一致,并研究了其有限的样品特性。我们还将提出的方法应用于工作兵团数据,以评估培训对仅在就业条件下观察到的小时工资的影响。该估计器可在统计软件的CausalGeight软件包中获得。
This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data for evaluating the effect of training on hourly wages which are only observed conditional on employment. The estimator is available in the causalweight package for the statistical software R.