人类互动中可解释性衡量标准的贝叶斯说明

论文标题

人类互动中可解释性衡量标准的贝叶斯说明

A Bayesian Account of Measures of Interpretability in Human-AI Interaction

论文作者

Sreedharan, Sarath, Kulkarni, Anagha, Chakraborti, Tathagata, Smith, David E., Kambhampati, Subbarao

论文摘要

现有的可解释剂行为设计的方法考虑了孤立的不同解释性措施。在本文中，我们认为，在现实世界中人类感知者的设计和部署中，可解释性的概念只是许多考虑因素。孤立开发的技术在考虑在一起时缺乏有用的两个关键特性：他们需要能够1）处理其相互竞争的特性； 2）一个开放的世界，其中人不仅在那里以一种特定的形式解释行为。为此，我们考虑了在现有文献中研究的三个众所周知的可解释行为实例（即可阐明性，可读性和可预测性），并提出了一个修订的模型，其中所有这些行为都可以有意义地建模。我们将通过用户研究的结果来强调这种统一模型的有趣后果，并激励这种修订。

Existing approaches for the design of interpretable agent behavior consider different measures of interpretability in isolation. In this paper we posit that, in the design and deployment of human-aware agents in the real world, notions of interpretability are just some among many considerations; and the techniques developed in isolation lack two key properties to be useful when considered together: they need to be able to 1) deal with their mutually competing properties; and 2) an open world where the human is not just there to interpret behavior in one specific form. To this end, we consider three well-known instances of interpretable behavior studied in existing literature -- namely, explicability, legibility, and predictability -- and propose a revised model where all these behaviors can be meaningfully modeled together. We will highlight interesting consequences of this unified model and motivate, through results of a user study, why this revision is necessary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题