改善蒸馏自我监督的语音处理模型在扭曲的设置下的普遍性

论文标题

改善蒸馏自我监督的语音处理模型在扭曲的设置下的普遍性

Improving generalizability of distilled self-supervised speech processing models under distorted settings

论文作者

Huang, Kuan-Po, Fu, Yu-Kuan, Hsu, Tsu-Yuan, Gutierrez, Fabian Ritter, Wang, Fan-Lin, Tseng, Liang-Hsuan, Zhang, Yu, Lee, Hung-yi

论文摘要

在各种语音处理任务中，自我监督的学识渊博（SSL）语音预先训练的模型表现良好。已经开发了SSL模型的蒸馏版本，以符合在设备上的语音应用程序的需求。尽管具有与原始SSL型号相似的性能，但蒸馏式蒸馏剂的性能降解比在扭曲的环境中的原始版本更具性能降解。本文提议在知识蒸馏期间将跨授予映射和域对抗训练应用于SSL模型，以减轻域不匹配问题引起的性能差距。结果显示，在不同下游任务的内部和室外扭曲设置下，表现出一致的性能改进，同时保持有效的模型大小。

Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. This paper proposes to apply Cross-Distortion Mapping and Domain Adversarial Training to SSL models during knowledge distillation to alleviate the performance gap caused by the domain mismatch problem. Results show consistent performance improvements under both in- and out-of-domain distorted setups for different downstream tasks while keeping efficient model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题