时代：通过序列扰动解释复发模型

论文标题

时代：通过序列扰动解释复发模型

TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

论文作者

Bento, João, Saleiro, Pedro, Cruz, André F., Figueiredo, Mário A. T., Bizarro, Pedro

论文摘要

尽管复发性的神经网络（RNN）是许多顺序决策任务的最先进，但在解释其预测方面几乎没有研究。在这项工作中，我们介绍了一个时间范围，这是一种模型反复发生的解释器，它构建在内核切片上并将其扩展到顺序域。计时计算特征，时间段和细胞级属性。由于序列可能是任意长的，我们进一步提出了一种修剪方法，该方法被证明会大大降低其计算成本和归因的差异。我们使用时代来解释现实世界中银行帐户收购欺诈检测RNN模型的预测，并从其解释中获取关键见解：i）该模型确定了与欺诈分析师考虑的账户收购提示相一致的重要特征和事件； ii）阳性预测序列只能修剪至原始长度的10％，因为旧事件具有残留归因值； iii）最新的积极预测的输入事件平均仅贡献了模型得分的41％； iv）特别是对客户年龄的高归因，这表明潜在的歧视性推理，后来证实是老年客户的假阳性率更高。

Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题