在自我监督的语音模型中探索有效的调整方法

论文标题

在自我监督的语音模型中探索有效的调整方法

Exploring Efficient-tuning Methods in Self-supervised Speech Models

论文作者

Chen, Zih-Ching, Fu, Chin-Lun, Liu, Chih-Ying, Li, Shang-Wen, Lee, Hung-yi

论文摘要

在这项研究中，我们旨在探索言语自学学习的有效调整方法。最近的研究表明，自我监督的学习（SSL）可以为不同的语音任务学习强大的表示。但是，每个下游任务的微调预训练模型都是参数态度的，因为众所周知，SSL模型具有数百万个参数。适配器是NLP中常用的轻质模块，以解决此问题。在下游任务中，SSL模型的参数被冷冻，并且仅训练适配器。鉴于缺乏研究通常探讨了适配器对自我监督的语音任务的有效性，因此我们打算通过在预先训练的语音SSL模型中添加各种适配器模块来填补这一空白。我们表明，通过降低90％的参数可以实现性能均衡，并讨论了有效调整技术的优缺点。这是对跨语音任务的各种适配器类型的首次全面调查。

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning pre-trained models for each downstream task is parameter-inefficient since SSL models are notoriously large with millions of parameters. Adapters are lightweight modules commonly used in NLP to solve this problem. In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained. Given the lack of studies generally exploring the effectiveness of adapters for self-supervised speech tasks, we intend to fill this gap by adding various adapter modules in pre-trained speech SSL models. We show that the performance parity can be achieved with over 90% parameter reduction, and discussed the pros and cons of efficient tuning techniques. This is the first comprehensive investigation of various adapter types across speech tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题