有效利用大型预培训模型用于低资源ASR

论文标题

有效利用大型预培训模型用于低资源ASR

Efficient Utilization of Large Pre-Trained Models for Low Resource ASR

论文作者

Vieting, Peter, Lüscher, Christoph, Dierkes, Julian, Schlüter, Ralf, Ney, Hermann

论文摘要

无监督的表示学习最近帮助自动语音识别（ASR）来解决具有有限标记数据的任务。之后，硬件限制和应用程序引起了一个问题，如何有效利用大型预培训模型并降低其复杂性。在这项工作中，我们研究了来自越南和德国医学领域的挑战性资源对话电话语料库。我们展示了除了对大型预训练模型的简单微调之外，使用无监督技术的好处，讨论如何使它们适应实用的电话任务，包括带宽转移，并研究用于培训和微调的不同数据条件。我们使用预训练技术胜过22％的项目基准。通过架构和培训的改进，可以通过添加0.8 h的内域自适应数据来实现29％的进一步增长。

Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data. Following this, hardware limitations and applications give rise to the question how to take advantage of large pre-trained models efficiently and reduce their complexity. In this work, we study a challenging low resource conversational telephony speech corpus from the medical domain in Vietnamese and German. We show the benefits of using unsupervised techniques beyond simple fine-tuning of large pre-trained models, discuss how to adapt them to a practical telephony task including bandwidth transfer and investigate different data conditions for pre-training and fine-tuning. We outperform the project baselines by 22% relative using pretraining techniques. Further gains of 29% can be achieved by refinements of architecture and training and 6% by adding 0.8 h of in-domain adaptation data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题