论文标题
信息的度量反映了记忆模式
Measures of Information Reflect Memorization Patterns
论文作者
论文摘要
已知神经网络可利用与目标标签共发生的伪造伪影(或捷径),并显示出启发式记忆。另一方面,已显示网络可以记住训练示例,从而产生了示例级记忆。这些记忆阻碍了网络对培训分布的概括。检测这种记忆可能具有挑战性,通常要求研究人员策划定制的测试集。在这项工作中,我们假设 - 并随后表明 - 不同神经元激活模式的多样性反映了模型的概括和记忆。我们通过信息理论措施量化了神经激活中的多样性,并为我们对跨越几种自然语言和视觉任务的实验的假设提供支持。重要的是,我们发现信息组织指出了两种形式的记忆形式,即使是在未标记的分发示例中计算出的神经激活。最后,我们证明了我们发现模型选择问题的实用性。该工作的相关代码和其他资源可在https://rachitbansal.github.io/information-measures上获得。
Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.