论文标题
连接火车:将重量的位置与其值分离
Train-by-Reconnect: Decoupling Locations of Weights from their Values
论文作者
论文摘要
是什么使未经训练的深神经网络(DNN)与受过训练的表演者不同?通过放大训练有素的DNN中的重量,我们发现重量的位置占据了训练所编码的大多数信息。在这一观察过程中,我们假设基于随机梯度的方法中的权重可以分为两个维度:权重及其确切值的位置。为了评估我们的假设,我们提出了一种名为LookAhead置换(Laperm)的新方法,通过重新连接权重来训练DNN。我们从经验上证明了Laperm的多功能性,同时产生广泛的证据来支持我们的假设:当初始权重是随机的且密集时,我们的方法表明的速度和性能与常规优化者相似或更好,例如Adam;当初始权重是随机且稀疏(许多零)时,我们的方法会改变神经元连接和达到准确性的方式,与受过良好训练的完全初始化网络相当。当初始权重共享一个值时,我们的方法会发现权重不可知的神经网络的精度要高得多。
What makes untrained deep neural networks (DNNs) different from the trained performant ones? By zooming into the weights in well-trained DNNs, we found it is the location of weights that hold most of the information encoded by the training. Motivated by this observation, we hypothesize that weights in stochastic gradient-based method trained DNNs can be separated into two dimensions: the locations of weights and their exact values. To assess our hypothesis, we propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights. We empirically demonstrate the versatility of LaPerm while producing extensive evidence to support our hypothesis: when the initial weights are random and dense, our method demonstrates speed and performance similar to or better than that of regular optimizers, e.g., Adam; when the initial weights are random and sparse (many zeros), our method changes the way neurons connect and reach accuracy comparable to that of a well-trained fully initialized network; when the initial weights share a single value, our method finds weight agnostic neural network with far better-than-chance accuracy.