反事实伤害

论文标题

Counterfactual harm

论文作者

Richens, Jonathan G., Beard, Rory, Thompson, Daniel H.

论文摘要

为了在现实世界中安全和道德行动，代理商必须能够推理伤害并避免有害行动。但是，迄今为止，还没有统计方法来衡量损害并将其分解为算法决策。在本文中，我们提出了使用因果模型对危害和利益的第一个正式定义。我们表明，在某些情况下，任何事实定义都必须违反基本直觉，并证明不能执行反事实推理的标准机器学习算法可以在分配转移后采取有害政策。我们使用对危害的定义来设计一个使用反事实功能的危害决策的框架。我们使用从随机对照试验数据中学到的剂量反应模型来鉴定最佳药物剂量的问题，证明了这一框架。我们发现，使用治疗效果选择剂量的标准方法会导致不必要的有害剂量，而我们的反事实方法使我们能够识别出无需牺牲疗效的剂量，而剂量明显较小。

To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题