论文标题
预算和无预算的因果土匪
Budgeted and Non-budgeted Causal Bandits
论文作者
论文摘要
在因果图中学习良好的干预措施可以建模为带有侧面信息的随机多臂匪徒问题。首先,当干预措施比观察结果更昂贵并指定预算时,我们研究了这个问题。如果从可间歇节点到奖励节点没有后门路径,那么我们提出了一种算法,以最大程度地减少简单的遗憾,即基于干预成本,最佳交易观察和干预措施。我们还提出了一种说明干预成本,利用因果侧信息的算法,并最大程度地减少了预期的累积遗憾,而无需超过预算。我们的累积缩放最小化算法的性能优于不考虑侧信息的标准算法。最后,我们研究了在一般图中学习最佳干预措施而没有预算限制的最佳干预措施的问题,并给出了一种算法,当已知每个干预措施的奖励变量的父级分布时,就实例参数会实现恒定的预期累积后悔。我们的结果经过实验验证,并将其与当前文献中最著名的界限进行了比较。
Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified. If there are no backdoor paths from an intervenable node to the reward node then we propose an algorithm to minimize simple regret that optimally trades-off observations and interventions based on the cost of intervention. We also propose an algorithm that accounts for the cost of interventions, utilizes causal side-information, and minimizes the expected cumulative regret without exceeding the budget. Our cumulative-regret minimization algorithm performs better than standard algorithms that do not take side-information into account. Finally, we study the problem of learning best interventions without budget constraint in general graphs and give an algorithm that achieves constant expected cumulative regret in terms of the instance parameters when the parent distribution of the reward variable for each intervention is known. Our results are experimentally validated and compared to the best-known bounds in the current literature.