论文标题

MLP-ASR:序列长度不可知的全MLP架构,用于语音识别

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

论文作者

Sakuma, Jin, Komatsu, Tatsuya, Scheibler, Robin

论文摘要

我们提出了适用于可变长度输入的多层感知器(MLP)的架构。最近提出的用于图像分类的基于MLP的体系结构只能用于固定的预定义尺寸的输入。但是,许多类型的数据在长度上是自然变化的,例如声学信号。我们提出了三种方法,以扩展基于MLP的架构,以与任意长度序列一起使用。第一个使用在傅立叶域中应用的圆形卷积,第二个应用深度卷积,最终依赖于移位操作。我们通过LibrisPeech和Tedlium2 Corpora评估了在自动语音识别任务上评估拟议的架构。最佳的基于MLP的架构将WER提高1.0 / 0.9%,0.9 / 0.5%的LibrisPeech Dev-Clean / Dev-Oltter-Otter,Test-Clean / Test-Other集合,TEDLIUM2 DEV / TEX SET的0.8 / 1.1%使用基于自我注意力的建筑的大小为86.4%。

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9%, 0.9 / 0.5% on Librispeech dev-clean/dev-other, test-clean/test-other set, and 0.8 / 1.1% on Tedlium2 dev/test set using 86.4% the size of self-attention-based architecture.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源