用综合的语音数据训练唤醒单词检测

论文标题

用综合的语音数据训练唤醒单词检测

Training Wake Word Detection with Synthesized Speech Data on Confusion Words

论文作者

Jia, Yan, Cai, Zexin, Ma, Murong, Zhao, Zeqing, Wang, Xuyang, Wang, Junjie, Li, Ming

论文摘要

在现实生活中的关键字发现应用程序中通常会遇到令人困惑的字，这会导致由于复杂的口语术语和各种听起来与预定义的关键字相似的单词而导致性能的严重下降。为了增强唤醒单词检测系统在这种情况下的鲁棒性，我们研究了两个数据增强设置，用于培训端到端KWS系统。一个是涉及来自多演讲者语音综合系统的合成数据，而另一个增强是通过在声学特征中添加随机噪声来执行的。实验结果表明，增强有助于改善系统的鲁棒性。此外，通过通过多演讲者文本到语音系统生成的合成数据来增强训练集，我们就可以在混乱的单词场景方面取得了重大改进。

Confusing-words are commonly encountered in real-life keyword spotting applications, which causes severe degradation of performance due to complex spoken terms and various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness on such scenarios, we investigate two data augmentation setups for training end-to-end KWS systems. One is involving the synthesized data from a multi-speaker speech synthesis system, and the other augmentation is performed by adding random noise to the acoustic feature. Experimental results show that augmentations help improve the system's robustness. Moreover, by augmenting the training set with the synthetic data generated by the multi-speaker text-to-speech system, we achieve a significant improvement regarding confusing words scenario.

下载PDF全文

下载文献需遵守相关版权规定

论文标题