论文标题
使用信息理论检查深神经网络的因果结构
Examining the causal structures of deep neural networks using information theory
论文作者
论文摘要
经常在其对输入的响应水平上检查深神经网络(DNN),例如分析节点和数据集之间的相互信息。然而,DNN也可以在因果关系的层面上进行检查,并探索网络本身层中的“做什么”。从历史上看,分析DNN的因果结构所受到的关注少于了解其对输入的反应。然而,从定义上讲,概括性必须是DNN因果结构的函数,因为它反映了DNN对看不见甚至未定义的未来输入的响应。在这里,我们介绍了一套基于信息理论的指标套件,以量化和跟踪训练过程中DNN的因果结构的变化。具体而言,我们介绍了前馈DNN的有效信息(EI),该信息是最大渗透扰动后层输入和输出之间的相互信息。 EI可用于评估每一层中下游目标的因果影响节点的程度。我们表明,可以进一步分解EI,以检查层的灵敏度(通过边缘传递扰动的方式来衡量)和层的退化(通过边缘重叠如何干扰传输的方式来衡量),以及层集成信息的量估计。这些特性共同定义了每个层位于“因果平面”中的位置,可将其用于可视化图层连接如何变得更灵敏或随时间变化,以及在训练过程中积分如何变化,从而揭示了逐层因果结构的区别。这些结果可能有助于理解DNN的概括能力,并提供了使DNN既更具概括性又可以解释的基础工具。
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN's causal structure since it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the "causal plane" which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable.