论文标题
走向通用的语音分析引擎
Towards a Common Speech Analysis Engine
论文作者
论文摘要
自我监督的表示学习的最新创新导致了自然语言处理的显着进步。也就是说,在语音处理领域中,基于学习的学习系统尚未被视为最先进的系统。我们建议利用基于自我监督的语音处理的最新进展,以创建共同的语音分析引擎。这样的引擎应该能够使用单个体系结构来处理多个语音处理任务,以获得最新的准确性。引擎还必须通过小型培训数据集为新任务提供支持。除此之外,通用引擎应能够通过客户内部私人数据来支持分布式培训。我们根据Hubert自我监督的语音表示,为常见语音分析引擎提供了体系结构。根据实验,我们报告了对标准评估NIST-LRE 07和IEMOCAP的语言识别和情感识别的结果。我们的结果超过了迄今为止这些任务的最新表现。我们还使用减少的培训数据分析了引擎关于情绪识别任务的发动机,并展示了如何获得改进的结果。
Recent innovations in self-supervised representation learning have led to remarkable advances in natural language processing. That said, in the speech processing domain, self-supervised representation learning-based systems are not yet considered state-of-the-art. We propose leveraging recent advances in self-supervised-based speech processing to create a common speech analysis engine. Such an engine should be able to handle multiple speech processing tasks, using a single architecture, to obtain state-of-the-art accuracy. The engine must also enable support for new tasks with small training datasets. Beyond that, a common engine should be capable of supporting distributed training with client in-house private data. We present the architecture for a common speech analysis engine based on the HuBERT self-supervised speech representation. Based on experiments, we report our results for language identification and emotion recognition on the standard evaluations NIST-LRE 07 and IEMOCAP. Our results surpass the state-of-the-art performance reported so far on these tasks. We also analyzed our engine on the emotion recognition task using reduced amounts of training data and show how to achieve improved results.