论文标题

Censer:基于自我监督的预培训的课程半监督学习语音识别

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

论文作者

Zhang, Bowen, Cao, Songjun, Zhang, Xiaoming, Zhang, Yike, Ma, Long, Shinozaki, Takahiro

论文摘要

最近的研究表明,自我监管的预训练和自我训练(伪标记)提供的好处是互补的。但是,在培训前框架下的半监督微调策略仍未充分研究。此外,现代的半监督语音识别算法要么不加区别地处理未标记的数据,要么以置信度阈值过滤噪音样本。不同标记的数据之间的差异通常被忽略。在本文中,我们提出了基于自我监督的预训练的半监督语音识别算法,以最大程度地利用未标记的数据。 Censer的预训练阶段采用WAV2VEC2.0,微调阶段采用了Slimipl的改进的半监督学习算法,该算法根据其伪标签的质量逐渐利用了未标记的数据。我们还结合了一个时间伪标签池和指数的移动平均线,以控制伪标签的更新频率并避免模型发散。与现有方法相比,关于Libri-Light和Librispeech数据集的实验结果表明,我们所提出的方法在更统一的同时取得了更好的性能。

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels' update frequency and to avoid model divergence. Experimental results on Libri-Light and LibriSpeech datasets manifest our proposed method achieves better performance compared to existing approaches while being more unified.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源