在看不见的条件下分析与语言无关的说话者匿名框架

论文标题

在看不见的条件下分析与语言无关的说话者匿名框架

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

论文作者

Miao, Xiaoxiao, Wang, Xin, Cooper, Erica, Yamagishi, Junichi, Tomashenko, Natalia

论文摘要

在以前的工作中，我们提出了一个基于自我监督的学习模型的独立于语言的说话者匿名系统。尽管系统可以匿名化任何语言的语音数据，但匿名化是不完美的，并且匿名语音的语音内容被扭曲了。当输入语音来自训练数据中看不见的领域时，此限制会更加严重。这项研究分析了在看不见的条件下匿名系统的瓶颈。发现培训和测试数据之间的域（例如语言和渠道）不匹配神经波形声码器和匿名扬声器向量，这限制了整个系统的性能。发现增加了Vocoder的培训数据多样性有助于减少其隐式语言和渠道依赖性。此外，发现基于简单的相关域自适应策略可显着减轻对匿名扬声器矢量的不匹配。音频样本和源代码可在线获得。

In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题