MLP-ASR：序列长度不可知的全MLP架构，用于语音识别

论文标题

MLP-ASR：序列长度不可知的全MLP架构，用于语音识别

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

论文作者

Sakuma, Jin, Komatsu, Tatsuya, Scheibler, Robin

论文摘要

我们提出了适用于可变长度输入的多层感知器（MLP）的架构。最近提出的用于图像分类的基于MLP的体系结构只能用于固定的预定义尺寸的输入。但是，许多类型的数据在长度上是自然变化的，例如声学信号。我们提出了三种方法，以扩展基于MLP的架构，以与任意长度序列一起使用。第一个使用在傅立叶域中应用的圆形卷积，第二个应用深度卷积，最终依赖于移位操作。我们通过LibrisPeech和Tedlium2 Corpora评估了在自动语音识别任务上评估拟议的架构。最佳的基于MLP的架构将WER提高1.0 / 0.9％，0.9 / 0.5％的LibrisPeech Dev-Clean / Dev-Oltter-Otter，Test-Clean / Test-Other集合，TEDLIUM2 DEV / TEX SET的0.8 / 1.1％使用基于自我注意力的建筑的大小为86.4％。

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9%, 0.9 / 0.5% on Librispeech dev-clean/dev-other, test-clean/test-other set, and 0.8 / 1.1% on Tedlium2 dev/test set using 86.4% the size of self-attention-based architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题