端到端和自我监督的学习比较2022种口

论文标题

端到端和自我监督的学习比较2022种口

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

论文作者

Sheikh, Shakeel Ahmad, Sahidullah, Md, Hirsch, Fabrice, Ouni, Slim

论文摘要

在本文中，我们介绍了以自欺欺人方式培训的端到端和语音嵌入系统，以参与ACM Multimedia 2022比较挑战，特别是口吃的子挑战。特别是，我们从预先训练的WAV2VEC2.0模型中利用嵌入式的嵌入方式，用于在KSOF数据集上进行口吃检测（SD）。嵌入提取后，我们使用多种SD方法进行基准测试。我们提出的基于自我监督的SD系统的验证和测试集分别为36.9％和41.0％，即31.32％（验证集）和1.49％（测试集）高于最佳（DEEPSPERTRUM）挑战基线（CBL）。此外，我们表明，与MEL频率曲线系数（MFCC）功能的连接层嵌入进一步提高了CBL验证和测试集的UAR分别提高了33.81％和5.45％的UAR。最后，我们证明了WAV2VEC2.0层的所有层的总和信息的相对边缘分别超过了验证和测试集的相对边距45.91％和5.69％。大挑战：计算副语言学挑战

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively. Grand-challenge: Computational Paralinguistics ChallengE

下载PDF全文

下载文献需遵守相关版权规定

论文标题