论文标题
基于光谱修改的数据扩展用于改善儿童演讲的端到端ASR
Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech
论文作者
论文摘要
培训对儿童语音识别的强大自动语音识别系统(ASR)系统是一项具有挑战性的任务,因为成人和儿童言语的声学属性固有差异以及公开可用的儿童语音数据集的稀缺性。在本文中,引入了一种新型的分段频谱翘曲和共振体能量的扰动,以从成年人的语音频谱中产生像儿童一样的语音频谱。然后,这种修改后的成人频谱用作增强数据,以改善儿童语音识别的端到端ASR系统。与人声道长度扰动(VTLP)基线系统相比,所提出的数据增强方法分别对儿童开发和测试集的WER分别为6.5%和6.1%,分别为100小时成人语音数据集进行了培训。当使用LibrisPeech集培训中添加儿童的语音数据时,与VTLP基线系统相比,WER的相对相对减少了3.7%和5.1%。
Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset. In this paper, a novel segmental spectrum warping and perturbations in formant energy are introduced, to generate a children-like speech spectrum from that of an adult's speech spectrum. Then, this modified adult spectrum is used as augmented data to improve end-to-end ASR systems for children's speech recognition. The proposed data augmentation methods give 6.5% and 6.1% relative reduction in WER on children dev and test sets respectively, compared to the vocal tract length perturbation (VTLP) baseline system trained on Librispeech 100 hours adult speech dataset. When children's speech data is added in training with Librispeech set, it gives a 3.7 % and 5.1% relative reduction in WER, compared to the VTLP baseline system.