关于RNN的Lyapunov指数：使用动态系统工具了解信息传播

论文标题

关于RNN的Lyapunov指数：使用动态系统工具了解信息传播

On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

论文作者

Vogt, Ryan, Touzel, Maximilian Puelma, Shlizerman, Eli, Lajoie, Guillaume

论文摘要

经常性的神经网络（RNN）已成功应用于涉及顺序数据的各种问题，但它们的优化对参数初始化，体系结构和优化器超参数敏感。将RNN视为动力系统，这是一种自然的稳定性方法，即长期迭代的生长和衰减是构成Lyapunov Spectrum的Lyapunov指数（LES）。 LE与RNN训练动力学的稳定性有关，因为信息的正向传播与误差梯度的向后传播有关。 LE衡量了非线性系统轨迹的渐近率和收缩率，并将稳定性分析推广到构建数据驱动的RNN非自治动力学的时变吸引子。作为理解和利用训练动力学稳定性的工具，Lyapunov Spectrum填补了有限范围的规定数学方法和计算实质性昂贵的经验方法之间的现有差距。为了利用此工具，我们实施了一种有效的方法来计算训练期间的RNN LES，讨论由典型顺序数据集驱动的标准RNN体系结构所特有的方面，并证明Lyapunov Spectrum可以用作跨标准器的训练稳定性的强大读数。借助这种面向博览会的贡献，我们希望引起人们对这一经过研究的研究，但理论上扎根的工具可以理解RNN中的训练稳定性。

Recurrent neural networks (RNNs) have been successfully applied to a variety of problems involving sequential data, but their optimization is sensitive to parameter initialization, architecture, and optimizer hyperparameters. Considering RNNs as dynamical systems, a natural way to capture stability, i.e., the growth and decay over long iterates, are the Lyapunov Exponents (LEs), which form the Lyapunov spectrum. The LEs have a bearing on stability of RNN training dynamics because forward propagation of information is related to the backward propagation of error gradients. LEs measure the asymptotic rates of expansion and contraction of nonlinear system trajectories, and generalize stability analysis to the time-varying attractors structuring the non-autonomous dynamics of data-driven RNNs. As a tool to understand and exploit stability of training dynamics, the Lyapunov spectrum fills an existing gap between prescriptive mathematical approaches of limited scope and computationally-expensive empirical approaches. To leverage this tool, we implement an efficient way to compute LEs for RNNs during training, discuss the aspects specific to standard RNN architectures driven by typical sequential datasets, and show that the Lyapunov spectrum can serve as a robust readout of training stability across hyperparameters. With this exposition-oriented contribution, we hope to draw attention to this understudied, but theoretically grounded tool for understanding training stability in RNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题