论文标题
深正交线性网络很浅
Deep orthogonal linear networks are shallow
论文作者
论文摘要
我们考虑训练深层线性网络的问题,该网络由正交矩阵的产物组成,中间没有非线性。我们表明,通过riemannian梯度下降训练权重等同于通过梯度下降训练整个分解。这意味着在这种情况下,根本没有过分散射和隐式偏见的影响:训练如此深,过度的,网络的网络完全等同于训练单层浅网络。
We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.