论文标题
跑步针迹搜索:用于流编码器decoder asr的新颖块同步解码
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR
论文作者
论文摘要
编码器解码器自动语音识别(ASR)系统的流式推断对于减少延迟至关重要,这对于交互式用例至关重要。为此,我们提出了一种新型的块同步解码算法,使用混合方法结合了终点预测和确定后端点。在端点预测中,我们计算使用CTC后验在当前块的编码器特征中尚未发出的令牌数量的期望。根据期望值,解码器预测终点将实现连续块同步作为运行针迹。同时,确定后端点概率地检测到源目标的向后跳跃,这是由于端点的错误预测引起的。然后,它通过丢弃这些假设作为后针来恢复解码。我们将这些方法合并为混合方法,即跑步的针迹搜索,从而降低了计算成本和延迟。对各种ASR任务的评估显示了我们提出的解码算法的效率,该算法可减少潜伏期,例如在Librispeech测试集中从1487 ms到821毫秒的第90%,同时保持高识别精度。
A streaming style inference of encoder-decoder automatic speech recognition (ASR) system is important for reducing latency, which is essential for interactive use cases. To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination. In the endpoint prediction, we compute the expectation of the number of tokens that are yet to be emitted in the encoder features of the current blocks using the CTC posterior. Based on the expectation value, the decoder predicts the endpoint to realize continuous block synchronization, as a running stitch. Meanwhile, endpoint post-determination probabilistically detects backward jump of the source-target attention, which is caused by the misprediction of endpoints. Then it resumes decoding by discarding those hypotheses, as back stitch. We combine these methods into a hybrid approach, namely run-and-back stitch search, which reduces the computational cost and latency. Evaluations of various ASR tasks show the efficiency of our proposed decoding algorithm, which achieves a latency reduction, for instance in the Librispeech test set from 1487 ms to 821 ms at the 90th percentile, while maintaining a high recognition accuracy.