自我监督的语音表征保留语音特征，而匿名声音

论文标题

自我监督的语音表征保留语音特征，而匿名声音

Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices

论文作者

Hernandez, Abner, Pérez-Toro, Paula Andrea, Vásquez-Correa, Juan Camilo, Orozco-Arroyave, Juan Rafael, Maier, Andreas, Yang, Seung Hee

论文摘要

收集语音数据是培训语音识别系统和其他基于语音的机器学习模型的重要一步。但是，隐私保护问题是越来越多的问题，必须解决。当前的研究调查了语音转换作为匿名声音的方法。特别是，我们使用包括WAV2VEC2.0，Hubert和Unispeech在内的自我监督的语音表示训练多个语音转换模型。转换后的声音将低单词错误率保留在原始语音的1％之内。在LibrisPeech测试集中，同样的错误率从1.52％增加到46.24％，而VCTK语料库的发言人的错误率从3.75％提高到45.84％，这表示在说话者验证上的性能下降。最后，我们对违反语音数据进行实验，以表明可以从匿名声音中提取与表达，韵律，发音和语音相关的语音特征，以区分健康和病理语音。

Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech representations including Wav2Vec2.0, Hubert and UniSpeech. Converted voices retain a low word error rate within 1% of the original voice. Equal error rate increases from 1.52% to 46.24% on the LibriSpeech test set and from 3.75% to 45.84% on speakers from the VCTK corpus which signifies degraded performance on speaker verification. Lastly, we conduct experiments on dysarthric speech data to show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices for discriminating between healthy and pathological speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题