论文标题
基于Wasserstein Barycenter的模型融合和神经网络的线性模式连通性
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
论文作者
论文摘要
根据Wasserstein Barycenter(WB)和Gromov-Wasserstein Barycenter(GWB)的概念,我们提出了一个统一的神经网络(NN)模型融合数学框架,并利用它来揭示有关SGD解决方案线性模式连接性的新见解。在我们的框架中,融合以层次的方式出现,并以网络中节点的解释为基础,作为其前一层的函数。我们的数学框架的多功能性使我们能够在每种情况下利用网络体系结构的特定结构,讨论广泛的NN,包括完全连接的NN,CNN,RESNET,RNN和LSTM的模型融合和线性模式连接。我们将广泛的数值实验介绍给:1)说明与其他模型融合方法相关的方法的强度,2)从某些角度来看,为最近的猜想提供了新的经验证据,这些猜想说,这两个基于梯度的方法发现了两个基于梯度的方法最终躺在损失范围的同一盆地上,这是在重量置于权重的适当置于损失范围的基础上。
Based on the concepts of Wasserstein barycenter (WB) and Gromov-Wasserstein barycenter (GWB), we propose a unified mathematical framework for neural network (NN) model fusion and utilize it to reveal new insights about the linear mode connectivity of SGD solutions. In our framework, the fusion occurs in a layer-wise manner and builds on an interpretation of a node in a network as a function of the layer preceding it. The versatility of our mathematical framework allows us to talk about model fusion and linear mode connectivity for a broad class of NNs, including fully connected NN, CNN, ResNet, RNN, and LSTM, in each case exploiting the specific structure of the network architecture. We present extensive numerical experiments to: 1) illustrate the strengths of our approach in relation to other model fusion methodologies and 2) from a certain perspective, provide new empirical evidence for recent conjectures which say that two local minima found by gradient-based methods end up lying on the same basin of the loss landscape after a proper permutation of weights is applied to one of the models.