关于自我监督语音模型的压缩序列

论文标题

关于自我监督语音模型的压缩序列

On Compressing Sequences for Self-Supervised Speech Models

论文作者

Meng, Yen, Chen, Hsuan-Jui, Shi, Jiatong, Watanabe, Shinji, Garcia, Paola, Lee, Hung-yi, Tang, Hao

论文摘要

随着自我监督模型变得更大，压缩自我监管的模型变得越来越必要。尽管以前的方法主要集中于压缩模型大小，但缩短序列也有效地降低了计算成本。在这项工作中，我们研究了自我监督学习的时间轴的固定长度和可变长度子采样。我们探讨了单个下游任务如何对输入帧速率敏感。在训练自我监督模型的同时，进行了采样，不仅可以在一定的帧速率下改善下游任务的总体性能，而且还带来了明显的推理。在低帧速率下，可变长度的子采样表现特别出色。此外，如果我们可以访问语音边界，我们发现平均帧速率低至10 Hz的性能没有降解。

Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how individual downstream tasks are sensitive to input frame rates. Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference. Variable-length subsampling performs particularly well under low frame rates. In addition, if we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.

下载PDF全文

下载文献需遵守相关版权规定

论文标题