论文标题
单词的海洋:对文本数据的锚的深入分析
A Sea of Words: An In-Depth Analysis of Anchors for Text Data
论文作者
论文摘要
锚(Ribeiro等,2018)是一种基于规则后的可解释性方法。对于文本数据,它建议通过突出一组单词(一个锚)来解释决定,以便解释的模型在文档中存在时具有相似的输出。在本文中,我们提出了对锚定的第一个理论分析,考虑到寻找最佳锚的搜索是详尽的。在形式化了文本分类算法之后,当矢量化步骤为TF-IDF时,我们在不同类别的模型上介绍明确的结果,并且在删除时,单词被固定的删除量牌代替。我们的查询涵盖了基本IF-Then规则和线性分类器等模型。然后,我们利用此分析来了解任何可区分分类器的锚行为。对于神经网络,我们从经验上表明,相对于模型的最高部分衍生物相对于输入的单词(由反向文档频率重新持续)是由锚定选择的。
Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they are present in a document. In this paper, we present the first theoretical analysis of Anchors, considering that the search for the best anchor is exhaustive. After formalizing the algorithm for text classification, we present explicit results on different classes of models when the vectorization step is TF-IDF, and words are replaced by a fixed out-of-dictionary token when removed. Our inquiry covers models such as elementary if-then rules and linear classifiers. We then leverage this analysis to gain insights on the behavior of Anchors for any differentiable classifiers. For neural networks, we empirically show that the words corresponding to the highest partial derivatives of the model with respect to the input, reweighted by the inverse document frequencies, are selected by Anchors.