论文标题
使用自我注意力和增强记忆的流媒体变压器的声学模型
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
论文作者
论文摘要
基于变形金刚的声学建模已为混合和序列到序列的语音识别而获得了很大的成功。但是,它需要访问完整的序列,并且相对于刚度序列长度,计算成本二次增长。这些因素限制了其对溪流应用的采用。在这项工作中,我们提出了一种新颖的增强体系自我注意力,该研究列入了输入序列和一大堆记忆。存储器银行为所有处理过的segments的嵌入信息。与广泛使用的LC-BlstM基线相比,在LibrisPeech基准测试基准中,我们提出的方法Outperform在所有现有的流媒体变压器方法中,相对误差降低了15%以上。我们的发现也已在一些大型内部数据集上得到证实。
Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. However, it requires access to the full sequence, and thecomputational cost grows quadratically with respect to the in-put sequence length. These factors limit its adoption for stream-ing applications. In this work, we proposed a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories. The memory bankstores the embedding information for all the processed seg-ments. On the librispeech benchmark, our proposed methodoutperforms all the existing streamable transformer methods bya large margin and achieved over 15% relative error reduction,compared with the widely used LC-BLSTM baseline. Our find-ings are also confirmed on some large internal datasets.