当心合理化陷阱！当语言模型解释与我们的语言心理模型不同时

论文标题

当心合理化陷阱！当语言模型解释与我们的语言心理模型不同时

Beware the Rationalization Trap! When Language Model Explainability Diverges from our Mental Models of Language

论文作者

Sevastjanova, Rita, El-Assady, Mennatallah

论文摘要

语言模型的学习和表示语言与人类不同。他们学习形式而不是含义。因此，为了评估语言模型解释性的成功，我们需要考虑其与用户的语言心理模型的差异的影响。在该立场论文中，我们认为，为了避免有害的合理化并实现对语言模型的真实理解，解释过程必须满足三个主要条件：（1）解释必须真实地代表模型行为，即具有很高的忠诚；（2）解释必须完整，因为缺少信息会扭曲事实；（3）解释必须考虑用户的心理模型，逐步验证一个人的知识并适应他们的理解。我们介绍了一个决策树模型，以展示当前解释未能达到目标的潜在原因。我们进一步强调了以人为本的设计从多个角度解释该模型的必要性，从而逐步将解释调整为不断变化的用户期望。

Language models learn and represent language differently than humans; they learn the form and not the meaning. Thus, to assess the success of language model explainability, we need to consider the impact of its divergence from a user's mental model of language. In this position paper, we argue that in order to avoid harmful rationalization and achieve truthful understanding of language models, explanation processes must satisfy three main conditions: (1) explanations have to truthfully represent the model behavior, i.e., have a high fidelity; (2) explanations must be complete, as missing information distorts the truth; and (3) explanations have to take the user's mental model into account, progressively verifying a person's knowledge and adapting their understanding. We introduce a decision tree model to showcase potential reasons why current explanations fail to reach their objectives. We further emphasize the need for human-centered design to explain the model from multiple perspectives, progressively adapting explanations to changing user expectations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题