解释，评估和增强神经网络的学会表示

论文标题

解释，评估和增强神经网络的学会表示

Explaining, Evaluating and Enhancing Neural Networks' Learned Representations

论文作者

Bertolini, Marco, Clevert, Djork-Arné, Montanari, Floriane

论文摘要

深度学习中的大多数可解释性努力都集中在（1）针对输入特征的特定下游任务的解释以及（2）对模型施加约束，通常是以预测性能为代价。但是，（无监督）表示学习和转移学习的新进展提高了对没有特定下游任务的网络的解释框架的需求。我们通过展示解释性可以成为更好，更有效的表示的障碍来解决这些挑战。具体而言，我们提出了一种自然聚合方法，可以概括神经网络的任何两个（卷积）层之间的归因图。此外，我们采用这种归因来定义两个新的分数来评估潜在嵌入的信息性和分解。广泛的实验表明，所提出的分数确实与所需特性相关。我们还确认并扩展了有关模型参数某些常见显着性策略独立性的先前已知结果。最后，我们表明，在培训代表学习任务期间，采用我们提出的分数作为约束，可以改善模型的下游性能。

Most efforts in interpretability in deep learning have focused on (1) extracting explanations of a specific downstream task in relation to the input features and (2) imposing constraints on the model, often at the expense of predictive performance. New advances in (unsupervised) representation learning and transfer learning, however, raise the need for an explanatory framework for networks that are trained without a specific downstream task. We address these challenges by showing how explainability can be an aid, rather than an obstacle, towards better and more efficient representations. Specifically, we propose a natural aggregation method generalizing attribution maps between any two (convolutional) layers of a neural network. Additionally, we employ such attributions to define two novel scores for evaluating the informativeness and the disentanglement of latent embeddings. Extensive experiments show that the proposed scores do correlate with the desired properties. We also confirm and extend previously known results concerning the independence of some common saliency strategies from the model parameters. Finally, we show that adopting our proposed scores as constraints during the training of a representation learning task improves the downstream performance of the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题