论文标题

不要那么确定!通过置信度放松来增强ASR解码

Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

论文作者

Wullach, Tomer, Chazan, Shlomo E.

论文摘要

自动语音识别(ASR)系统经常使用基于搜索的解码策略,旨在通过考虑多个候选人来找到最佳的成绩单。一项突出的语音识别解码启发式是光束搜索,它以使用预测分布计算的最大可能性寻求转录本。在显示各种任务中的大量性能增长时,Beam Search在预测的概率高度自信时会失去某些有效性,即,单个或几个类别的预测分布被批量。我们表明,最近提出的基于自我监督的学习(SSL)的ASR模型倾向于产生异常自信的预测,这可能会妨碍真正考虑各种候选人的搜索。我们执行层分析以揭示和可视化预测的发展,并提出了一个解码程序,以改善微调的ASR模型的性能。我们提出的方法不需要除原始微调之外的进一步培训,也不需要其他模型参数。实际上,我们发现我们提出的方法所需的推理计算要比当前方法要少得多。我们提出了汇总的顶部M层,可能利用中间层编码的有用信息以及放松的模型置信度。我们通过对不同量的标记资源和不同模型大小进行实证研究来证明我们的方法的有效性,从而显示出一致的改进,尤其是应用于低资源场景时。

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. Our proposed approach does not require further training beyond the original fine-tuning, nor additional model parameters. In fact, we find that our proposed method requires significantly less inference computation than current approaches. We propose aggregating the top M layers, potentially leveraging useful information encoded in intermediate layers, and relaxing model confidence. We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源