论文标题
部分可观测时空混沌系统的无模型预测
Are disentangled representations all you need to build speaker anonymization systems?
论文作者
论文摘要
语音信号包含许多敏感信息,例如说话者的身份,这些信息在收集语音数据时引起了隐私问题。说话者匿名化旨在转换语音信号以消除源说话者的身份,同时使口语内容不变。当前方法通过依靠内容/说话者的删除和语音转换来执行转换。通常,来自自动语音识别系统的声学模型会提取内容表示,而X-Vector系统则提取说话者表示。先前的工作表明,提取的功能并非完全脱离。本文解决了如何改善特征分离的特征,从而提高了转换的匿名语音。我们建议使用矢量量化从声学模型中删除扬声器信息来增强分离。使用Voice Privacy 2022 Toolkit进行的评估表明,向量量化有助于隐藏原始的说话者身份,同时保持语音识别的实用性。
Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns when speech data get collected. Speaker anonymization aims to transform a speech signal to remove the source speaker's identity while leaving the spoken content unchanged. Current methods perform the transformation by relying on content/speaker disentanglement and voice conversion. Usually, an acoustic model from an automatic speech recognition system extracts the content representation while an x-vector system extracts the speaker representation. Prior work has shown that the extracted features are not perfectly disentangled. This paper tackles how to improve features disentanglement, and thus the converted anonymized speech. We propose enhancing the disentanglement by removing speaker information from the acoustic model using vector quantization. Evaluation done using the VoicePrivacy 2022 toolkit showed that vector quantization helps conceal the original speaker identity while maintaining utility for speech recognition.