论文标题
神经网络中的多性性和能力
Polysemanticity and Capacity in Neural Networks
论文作者
论文摘要
神经网络中的个别神经元通常代表无关特征的混合物。这种现象称为多疾病,可以使解释神经网络更加困难,因此我们旨在了解其原因。我们建议通过功能\ emph {apcation}的镜头进行此操作,这是每个特征在嵌入空间中消耗的分数维度。我们表明,在玩具模型中,最佳容量分配倾向于单体代表最重要的特征,多元化代表了不太重要的特征(与它们对损失的影响成比例),并且完全忽略了最不重要的特征。当输入具有更高的峰度或稀疏性,并且在某些架构中比其他建筑中更普遍时,多性道理更为普遍。鉴于能力的最佳分配,我们继续研究嵌入空间的几何形状。我们发现了一个块 - 隔核的结构,在不同的模型中具有不同的块大小,突出了模型体系结构对其神经元的可解释性的影响。
Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult and so we aim to understand its causes. We propose doing so through the lens of feature \emph{capacity}, which is the fractional dimension each feature consumes in the embedding space. We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features. Polysemanticity is more prevalent when the inputs have higher kurtosis or sparsity and more prevalent in some architectures than others. Given an optimal allocation of capacity, we go on to study the geometry of the embedding space. We find a block-semi-orthogonal structure, with differing block sizes in different models, highlighting the impact of model architecture on the interpretability of its neurons.