论文标题

为许多感兴趣的结果进行政策学习:将最佳政策树与多目标贝叶斯优化相结合

Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation

论文作者

Rehill, Patrick, Biddle, Nicholas

论文摘要

学习最佳政策的方法使用因果机学习模型来创建人类解剖规则,以围绕不同政策干预措施的分配做出选择。但是,在现实的政策制定背景下,决策者经常关心结果之间的权衡,而不仅仅是单一的最大程度地提高效用以获得一个结果。本文提出了一种称为多目标政策学习(MOPOL)的方法,该方法结合了政策学习的最佳决策树和多目标贝叶斯优化方法,以探索多个结果之间的权衡。它是通过建立一个非主导模型的帕累托前沿,以用于控制结果权重的不同超参数设置。这里的关键是,低成本的贪婪树可以是计算上昂贵的最佳树的准确代理,以做出决策,这意味着可以重复适合学习帕累托边境。该方法应用于肯尼亚抗疟疾药物的非价格分配的现实情况下。

Methods for learning optimal policies use causal machine learning models to create human-interpretable rules for making choices around the allocation of different policy interventions. However, in realistic policy-making contexts, decision-makers often care about trade-offs between outcomes, not just single-mindedly maximising utility for one outcome. This paper proposes an approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach to explore the trade-off between multiple outcomes. It does this by building a Pareto frontier of non-dominated models for different hyperparameter settings which govern outcome weighting. The key here is that a low-cost greedy tree can be an accurate proxy for the very computationally costly optimal tree for the purposes of making decisions which means models can be repeatedly fit to learn a Pareto frontier. The method is applied to a real-world case-study of non-price rationing of anti-malarial medication in Kenya.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源