论文标题
关于时间概念漂移对模型解释的影响
On the Impact of Temporal Concept Drift on Model Explanations
论文作者
论文摘要
通常,自然语言处理中模型预测的忠诚度通常会根据与培训数据相同的时间分布(即同步设置)的固定数据进行评估。尽管模型性能通常由于时间变化(即时间概念漂移)而导致的,但目前尚不清楚当目标数据的时间跨度与用于训练模型的数据(即异步设置)时,忠诚度如何受到影响。为此,我们研究了时间变化对通过八种特征归因方法提取的模型解释的影响,以及六个文本分类任务中的三个选择的预测模型。我们的实验表明,在特征归因方法之间的时间变化下,忠诚不一致(例如,它根据方法减少或增加),基于注意的方法表明,基于注意力的方法表明了跨数据集的最强大的忠诚度得分; (ii)精选的预测模型在异步设置中大多是强大的,而预测性能的降解仅小。最后,特征归因方法在新鲜(即选择和预测模型)中使用时显示出冲突的行为,并用于衡量充足/全面性(即作为事后方法),这表明我们需要更多的稳健指标来评估事后解释后的解释忠诚。
Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i.e. synchronous settings). While model performance often deteriorates due to temporal variation (i.e. temporal concept drift), it is currently unknown how explanation faithfulness is impacted when the time span of the target data is different from the data used to train the model (i.e. asynchronous settings). For this purpose, we examine the impact of temporal variation on model explanations extracted by eight feature attribution methods and three select-then-predict models across six text classification tasks. Our experiments show that (i)faithfulness is not consistent under temporal variations across feature attribution methods (e.g. it decreases or increases depending on the method), with an attention-based method demonstrating the most robust faithfulness scores across datasets; and (ii) select-then-predict models are mostly robust in asynchronous settings with only small degradation in predictive performance. Finally, feature attribution methods show conflicting behavior when used in FRESH (i.e. a select-and-predict model) and for measuring sufficiency/comprehensiveness (i.e. as post-hoc methods), suggesting that we need more robust metrics to evaluate post-hoc explanation faithfulness.