论文标题

人类互动中可解释性衡量标准的贝叶斯说明

A Bayesian Account of Measures of Interpretability in Human-AI Interaction

论文作者

Sreedharan, Sarath, Kulkarni, Anagha, Chakraborti, Tathagata, Smith, David E., Kambhampati, Subbarao

论文摘要

现有的可解释剂行为设计的方法考虑了孤立的不同解释性措施。在本文中,我们认为,在现实世界中人类感知者的设计和部署中,可解释性的概念只是许多考虑因素。孤立开发的技术在考虑在一起时缺乏有用的两个关键特性:他们需要能够1)处理其相互竞争的特性; 2)一个开放的世界,其中人不仅在那里以一种特定的形式解释行为。为此,我们考虑了在现有文献中研究的三个众所周知的可解释行为实例(即可阐明性,可读性和可预测性),并提出了一个修订的模型,其中所有这些行为都可以有意义地建模。我们将通过用户研究的结果来强调这种统一模型的有趣后果,并激励这种修订。

Existing approaches for the design of interpretable agent behavior consider different measures of interpretability in isolation. In this paper we posit that, in the design and deployment of human-aware agents in the real world, notions of interpretability are just some among many considerations; and the techniques developed in isolation lack two key properties to be useful when considered together: they need to be able to 1) deal with their mutually competing properties; and 2) an open world where the human is not just there to interpret behavior in one specific form. To this end, we consider three well-known instances of interpretable behavior studied in existing literature -- namely, explicability, legibility, and predictability -- and propose a revised model where all these behaviors can be meaningfully modeled together. We will highlight interesting consequences of this unified model and motivate, through results of a user study, why this revision is necessary.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源