为许多感兴趣的结果进行政策学习：将最佳政策树与多目标贝叶斯优化相结合

论文标题

为许多感兴趣的结果进行政策学习：将最佳政策树与多目标贝叶斯优化相结合

Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation

论文作者

Rehill, Patrick, Biddle, Nicholas

论文摘要

学习最佳政策的方法使用因果机学习模型来创建人类解剖规则，以围绕不同政策干预措施的分配做出选择。但是，在现实的政策制定背景下，决策者经常关心结果之间的权衡，而不仅仅是单一的最大程度地提高效用以获得一个结果。本文提出了一种称为多目标政策学习（MOPOL）的方法，该方法结合了政策学习的最佳决策树和多目标贝叶斯优化方法，以探索多个结果之间的权衡。它是通过建立一个非主导模型的帕累托前沿，以用于控制结果权重的不同超参数设置。这里的关键是，低成本的贪婪树可以是计算上昂贵的最佳树的准确代理，以做出决策，这意味着可以重复适合学习帕累托边境。该方法应用于肯尼亚抗疟疾药物的非价格分配的现实情况下。

Methods for learning optimal policies use causal machine learning models to create human-interpretable rules for making choices around the allocation of different policy interventions. However, in realistic policy-making contexts, decision-makers often care about trade-offs between outcomes, not just single-mindedly maximising utility for one outcome. This paper proposes an approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach to explore the trade-off between multiple outcomes. It does this by building a Pareto frontier of non-dominated models for different hyperparameter settings which govern outcome weighting. The key here is that a low-cost greedy tree can be an accurate proxy for the very computationally costly optimal tree for the purposes of making decisions which means models can be repeatedly fit to learn a Pareto frontier. The method is applied to a real-world case-study of non-price rationing of anti-malarial medication in Kenya.

下载PDF全文

下载文献需遵守相关版权规定

论文标题