重新定义强化学习的反事实解释：概述，挑战和机遇

论文标题

重新定义强化学习的反事实解释：概述，挑战和机遇

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

论文作者

Gajcin, Jasmina, Dusparic, Ivana

论文摘要

尽管AI算法在各个领域都表现出色，但它们的缺乏透明度阻碍了他们对现实生活任务的应用。尽管针对非专家的解释是用户信任和人类协作所必需的，但大多数用于AI的解释方法都集中在开发人员和专家用户上。反事实解释是本地解释，可为用户提供有关在输出中可以更改黑框模型更改的输入中可以更改的内容的建议。反事实是用户友好的，并提供了可行的建议，以实现AI系统所需的输出。尽管对监督学习进行了广泛的研究，但很少有方法将其应用于加强学习（RL）。在这项工作中，我们探讨了RL中强大的解释方法的代表性不足的原因。首先，我们回顾了当前的反事实解释中的工作。此外，我们探讨了监督学习中的反事实解释与RL的反事实解释之间的差异，并确定了阻止在强化学习中采用方法的主要挑战。最后，我们重新定义了RL的反事实，并提出了用于在RL中实施反事实的研究方向。

While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题