论文标题
通过Cayley Transform在Stiefel歧管上有效的Riemannian优化
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
论文作者
论文摘要
严格地对参数矩阵的正常限制已显示出有利的深度学习。这相当于对Stiefel歧管的Riemannian优化,但是计算在计算上很昂贵。为了应对这一挑战,我们提出了两个主要贡献:(1)基于迭代的Cayley变换以进行优化更新的新有效缩回图,以及(2)基于动量投影和Cayley在Stiefel歧管上的coayley transfution的组合。我们指定了两种新的优化算法:具有动量的Cayley SGD,而Cayley Adam在Stiefel歧管上。理论上分析了Cayley SGD的收敛性。我们对CNN训练的实验表明,这两种算法:(a)相对于实施CNN参数正常词的现有方法,每个迭代的运行时间较小; (b)与基线SGD和ADAM算法相比,获得更快的收敛速度,而不会损害CNN的性能。 Cayley SGD和Cayley Adam也被证明可以减少优化RNN中单一透明矩阵的训练时间。
Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.