论文标题

简化序列建模的状态空间层

Simplified State Space Layers for Sequence Modeling

论文作者

Smith, Jimmy T. H., Warrington, Andrew, Linderman, Scott W.

论文摘要

使用结构化状态空间序列(S4)层的模型已经在远程序列建模任务上实现了最先进的性能。 S4层结合了线性状态空间模型(SSM),河马框架和深度学习,以实现高性能。我们以S4层的设计为基础,并引入了新的状态空间层S5层。 S4层使用许多独立的单输入,单输出SSM,而S5层使用一个多输入的多输出SSM。我们建立了S5和S4之间的连接,并使用它来开发S5模型使用的初始化和参数化。结果是一个状态空间层,可以利用高效且广泛实现的并行扫描,从而使S5匹配S4的计算效率,同时还可以在几个远程序列建模任务上实现最先进的性能。 S5在远距离竞技场基准中平均为87.4%,最困难的Path-X任务为98.5%。

Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent single-input, single-output SSMs, the S5 layer uses one multi-input, multi-output SSM. We establish a connection between S5 and S4, and use this to develop the initialization and parameterization used by the S5 model. The result is a state space layer that can leverage efficient and widely implemented parallel scans, allowing S5 to match the computational efficiency of S4, while also achieving state-of-the-art performance on several long-range sequence modeling tasks. S5 averages 87.4% on the long range arena benchmark, and 98.5% on the most difficult Path-X task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源