论文标题
通过突出有影响力的过渡,可以解释的在增强学习中的非政策评估
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
论文作者
论文摘要
强化学习中的非政策评估提供了使用观察数据来改善医疗保健和教育等领域的未来结果的机会,但是在高风险设置中安全部署需要评估其有效性的方法。诸如置信区间之类的传统措施可能由于噪声,数据有限和混杂而不足。在本文中,我们开发了一种可以用作混合人类系统系统的方法,以使人类专家能够分析政策评估估算的有效性。这是通过突出显示删除将对OPE估计产生很大影响的数据的观察结果来完成的,并制定了一组选择哪些规则,以选择哪些介绍给域专家进行验证。我们开发了使用两个不同函数类拟合Q评估的影响函数的方法:基于内核和线性最小二乘和重要性采样方法。有关医疗模拟和现实世界重症监护室数据的实验表明,我们的方法可用于识别评估过程中的局限性并使评估更强大。
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.